{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Recommender Systems 2020/21\n",
    "\n",
    "### Practice 2 - Non personalized recommenders"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### We will use the Movielens 10 million dataset. We download it and uncompress the file we need.\n",
    "\n",
    "### In order to reuse it in the future, we will put all of that in a class that we can call easily"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from urllib.request import urlretrieve\n",
    "import zipfile"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "# If file exists, skip the download\n",
    "data_file_path = \"data/Movielens_10M/\"\n",
    "data_file_name = data_file_path + \"movielens_10m.zip\"\n",
    "\n",
    "# If directory does not exist, create\n",
    "if not os.path.exists(data_file_path):\n",
    "    os.makedirs(data_file_path)\n",
    "\n",
    "if not os.path.exists(data_file_name):\n",
    "    urlretrieve (\"http://files.grouplens.org/datasets/movielens/ml-10m.zip\", data_file_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataFile = zipfile.ZipFile(\"data/Movielens_10M/movielens_10m.zip\")\n",
    "\n",
    "URM_path = dataFile.extract(\"ml-10M100K/ratings.dat\", path = \"data/Movielens_10M\")\n",
    "\n",
    "URM_file = open(URM_path, 'r')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "_io.TextIOWrapper"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(URM_file)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Let's take a look at the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\ferra\\Anaconda3\\envs\\RecSysFramework\\lib\\site-packages\\ipykernel_launcher.py:4: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.\n",
      "  after removing the cwd from sys.path.\n"
     ]
    }
   ],
   "source": [
    "URM_all_dataframe = pd.read_csv(filepath_or_buffer=URM_path, \n",
    "                                sep=\"::\", \n",
    "                                header=None, \n",
    "                                dtype={0:int, 1:int, 2:float, 3:int})\n",
    "\n",
    "URM_all_dataframe.columns = [\"UserID\", \"ItemID\", \"Interaction\", \"Timestamp\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>UserID</th>\n",
       "      <th>ItemID</th>\n",
       "      <th>Interaction</th>\n",
       "      <th>Timestamp</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>122</td>\n",
       "      <td>5.0</td>\n",
       "      <td>838985046</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>185</td>\n",
       "      <td>5.0</td>\n",
       "      <td>838983525</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>231</td>\n",
       "      <td>5.0</td>\n",
       "      <td>838983392</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>292</td>\n",
       "      <td>5.0</td>\n",
       "      <td>838983421</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>316</td>\n",
       "      <td>5.0</td>\n",
       "      <td>838983392</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>1</td>\n",
       "      <td>329</td>\n",
       "      <td>5.0</td>\n",
       "      <td>838983392</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1</td>\n",
       "      <td>355</td>\n",
       "      <td>5.0</td>\n",
       "      <td>838984474</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>1</td>\n",
       "      <td>356</td>\n",
       "      <td>5.0</td>\n",
       "      <td>838983653</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>1</td>\n",
       "      <td>362</td>\n",
       "      <td>5.0</td>\n",
       "      <td>838984885</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>1</td>\n",
       "      <td>364</td>\n",
       "      <td>5.0</td>\n",
       "      <td>838983707</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   UserID  ItemID  Interaction  Timestamp\n",
       "0       1     122          5.0  838985046\n",
       "1       1     185          5.0  838983525\n",
       "2       1     231          5.0  838983392\n",
       "3       1     292          5.0  838983421\n",
       "4       1     316          5.0  838983392\n",
       "5       1     329          5.0  838983392\n",
       "6       1     355          5.0  838984474\n",
       "7       1     356          5.0  838983653\n",
       "8       1     362          5.0  838984885\n",
       "9       1     364          5.0  838983707"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "URM_all_dataframe.head(n=10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The number of interactions is 10000054\n"
     ]
    }
   ],
   "source": [
    "print (\"The number of interactions is {}\".format(len(URM_all_dataframe)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### We can use this data to create a sparse matrix, notice that we have red UserID and ItemID as int\n",
    "### This is not always possible if the IDs are alphanumeric"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Now we can extract the list of unique user id and item id and display some statistics"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "userID_unique = URM_all_dataframe[\"UserID\"].unique()\n",
    "itemID_unique = URM_all_dataframe[\"ItemID\"].unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of items\t 10677, Number of users\t 69878\n",
      "Max ID items\t 65133, Max Id users\t 71567\n",
      "\n"
     ]
    }
   ],
   "source": [
    "n_users = len(userID_unique)\n",
    "n_items = len(itemID_unique)\n",
    "n_interactions = len(URM_all_dataframe)\n",
    "\n",
    "print (\"Number of items\\t {}, Number of users\\t {}\".format(n_items, n_users))\n",
    "print (\"Max ID items\\t {}, Max Id users\\t {}\\n\".format(max(itemID_unique), max(userID_unique)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### See that the max ID of items and users is higher than the number of unique values -> empty profiles\n",
    "### We should remove empty indices, to do so we create a new mapping"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "user_original_ID_to_index_dict = {}\n",
    "\n",
    "for user_id in userID_unique:\n",
    "    user_original_ID_to_index_dict[user_id] = len(user_original_ID_to_index_dict)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "item_original_ID_to_index_dict = {}\n",
    "\n",
    "for item_id in itemID_unique:\n",
    "    item_original_ID_to_index_dict[item_id] = len(item_original_ID_to_index_dict)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "New index for item 292 is 3\n"
     ]
    }
   ],
   "source": [
    "original_item_ID = 292\n",
    "print(\"New index for item {} is {}\".format(original_item_ID, item_original_ID_to_index_dict[original_item_ID]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### We now replace the IDs in the dataframe and we are ready to use the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "URM_all_dataframe[\"UserID\"] = [user_original_ID_to_index_dict[user_original] for user_original in\n",
    "                                      URM_all_dataframe[\"UserID\"].values]\n",
    "\n",
    "URM_all_dataframe[\"ItemID\"] = [item_original_ID_to_index_dict[item_original] for item_original in \n",
    "                                      URM_all_dataframe[\"ItemID\"].values]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>UserID</th>\n",
       "      <th>ItemID</th>\n",
       "      <th>Interaction</th>\n",
       "      <th>Timestamp</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>838985046</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>5.0</td>\n",
       "      <td>838983525</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>5.0</td>\n",
       "      <td>838983392</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>5.0</td>\n",
       "      <td>838983421</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>5.0</td>\n",
       "      <td>838983392</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>5.0</td>\n",
       "      <td>838983392</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0</td>\n",
       "      <td>6</td>\n",
       "      <td>5.0</td>\n",
       "      <td>838984474</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "      <td>5.0</td>\n",
       "      <td>838983653</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0</td>\n",
       "      <td>8</td>\n",
       "      <td>5.0</td>\n",
       "      <td>838984885</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0</td>\n",
       "      <td>9</td>\n",
       "      <td>5.0</td>\n",
       "      <td>838983707</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   UserID  ItemID  Interaction  Timestamp\n",
       "0       0       0          5.0  838985046\n",
       "1       0       1          5.0  838983525\n",
       "2       0       2          5.0  838983392\n",
       "3       0       3          5.0  838983421\n",
       "4       0       4          5.0  838983392\n",
       "5       0       5          5.0  838983392\n",
       "6       0       6          5.0  838984474\n",
       "7       0       7          5.0  838983653\n",
       "8       0       8          5.0  838984885\n",
       "9       0       9          5.0  838983707"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "URM_all_dataframe.head(n=10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of items\t 10677, Number of users\t 69878\n",
      "Max ID items\t 10676, Max Id users\t 69877\n",
      "\n",
      "Average interactions per user 143.11\n",
      "Average interactions per item 936.60\n",
      "\n",
      "Sparsity 98.66 %\n"
     ]
    }
   ],
   "source": [
    "userID_unique = URM_all_dataframe[\"UserID\"].unique()\n",
    "itemID_unique = URM_all_dataframe[\"ItemID\"].unique()\n",
    "\n",
    "n_users = len(userID_unique)\n",
    "n_items = len(itemID_unique)\n",
    "n_interactions = len(URM_all_dataframe)\n",
    "\n",
    "print (\"Number of items\\t {}, Number of users\\t {}\".format(n_items, n_users))\n",
    "print (\"Max ID items\\t {}, Max Id users\\t {}\\n\".format(max(itemID_unique), max(userID_unique)))\n",
    "print (\"Average interactions per user {:.2f}\".format(n_interactions/n_users))\n",
    "print (\"Average interactions per item {:.2f}\\n\".format(n_interactions/n_items))\n",
    "\n",
    "print (\"Sparsity {:.2f} %\".format((1-float(n_interactions)/(n_items*n_users))*100))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Rating distribution in time"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAERCAYAAAB2CKBkAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAXBklEQVR4nO3df7RlZX3f8fdHfsRUQAgzJcoAg5aCJAUiExSbVAwsAqQrNJEsoCyJBBe1FZLU0kqStbRqs0xXNbVKKGGZKTVJwZBqhFWQJkTAVmgYKj+NkAmITqGLIUSwWpGBb/84e+rhzv2x79yzz6/9fq11Fvfsvc85383c+3zO8+y9n52qQpLUXy+bdAGSpMkyCCSp5wwCSeo5g0CSes4gkKSeMwgkqedmMgiSbE7yZJIHWmx7WJJbktyX5NYkG8ZRoyTNipkMAuBq4LSW234Y+GRVHQN8APhQV0VJ0iyaySCoqtuBp4eXJXltks8luTvJF5Ic1aw6Gril+fnzwJljLFWSpt5MBsESrgIuqarjgUuBK5rl9wJvbX7+GWDfJAdOoD5Jmkp7TrqAUUiyD/Am4LokOxd/X/PfS4HLk7wduB34X8COcdcoSdNqLoKAQc/mG1V13MIVVfU48LPw/wPjrVX1zJjrk6SpNRdDQ1X1LPBokp8DyMCxzc/rkuzcz18BNk+oTEmaSjMZBEmuAe4AjkyyLcmFwHnAhUnuBR7keweFTwIeSvIwcBDw6xMoWZKmVpyGWpL6bSZ7BJKk0Zm5g8Xr1q2rjRs3TroMSZopd99991NVtX6xdTMXBBs3bmTLli2TLkOSZkqSx5Za59CQJPWcQSBJPWcQSFLPGQSS1HMGgST13MydNSRJvfK9iTRfaoQXA9sjkKRptVQIrLRulewRSNI0GWED35ZBIEmTNoHGf5hDQ5I0SRMOAbBHIEmTMQUBsJM9Akkat1GEwAjPGrJHIEnjMqpewIjvI2OPQJLGYUpDAOwRSFK3pmwYaDEGgSR1ZS0hMMbbCDs0JEldmJEQAHsEkjRaMxQAO9kjkKRRmcEQgA6DIMnmJE8meWCJ9eclua95fDHJsV3VIkmdSnY/BKomGgLQbY/gauC0ZdY/Cry5qo4BPghc1WEtktSNGe0FDOvsGEFV3Z5k4zLrvzj09E5gQ1e1SNLIzUEA7DQtxwguBG5aamWSi5JsSbJl+/btYyxLkhYxRyEAUxAESd7CIAjes9Q2VXVVVW2qqk3r168fX3GSNGwtxwJgKkMAJnz6aJJjgE8Ap1fVX02yFkla1hwGwE4T6xEkORT4NPC2qnp4UnVI0ormOASgwx5BkmuAk4B1SbYB7wP2AqiqK4H3AgcCV2TwP3lHVW3qqh5J2i1rOS10RnR51tC5K6x/B/COrj5fktZkznsBw5xiQpKG9SgAdpr4WUOSNDV6GAJgj0CSBnpwLGApBoGkfutpL2CYQSCpn9Z657A5CQEwCCT1zQzcOnLcDAJJ/WAALMkgkDTfDIAVefqopPllCLRij0DS/BlFAEAvQgAMAknzxl7AqhkEkuaDAbDbDAJJs89rAtbEIJA0uwyAkTAIJM0mp4YYGYNA0myxFzByBoGk2WAAdMYLyiRNP0OgU/YIJE0vA2AsDAJJ08drAsbKIJA0PQyAiTAIJE2ecwNNlEEgaXIMgKlgEEiaDIeBpoZBIGm87AVMHYNA0ngYAFPLIJDULQNg6hkEkrphAMwMg0DS6Iyq8QcDYIwMAklrZwDMNINA0uqMstFfyBCYCINA0uK6bPAXMgAmyiCQ9FIGQO8YBJLG2/iDATBlDAKp7+wB9J5BIPXVuALAxn/qdXaryiSbkzyZ5IEl1h+V5I4kzyW5tKs6JC1iHCFQZQjMiC7vWXw1cNoy658GfhH4cIc1SBqWdBMCOxv94YdmRmdBUFW3M2jsl1r/ZFXdBTzfVQ2SGqMKgMUafBv9mddlj2BkklyUZEuSLdu3b590OdJsGWUAaC7NRBBU1VVVtamqNq1fv37S5UizYa29AL/x98ZMBIGkVVprL8DGv1c8fVSaJwaAdkNnQZDkGuAkYF2SbcD7gL0AqurKJD8IbAH2A15M8svA0VX1bFc1SXPL+/9qDToLgqo6d4X1/xvY0NXnS71hL0Br5NCQNKsMAI2IQSDNGoeBNGIGgTQrDAB1xCCQpp03gVfHDAJpmtkL0BgYBNK08mCwxsQgkKaNvQCNmUEgTRN7AZoAg0CaFmudIE7aTQaBNGkGgCZsxdlHk7wmyQ1JnmpuPfnZJK8ZR3HS3NvdEHB6aI1Qm2mo/xPwB8APAq8GrgOu6bIoae6t5V4BBoBGrE0QpKp+t6p2NI/fA/xNlHaXQ0GaMm2OEXw+yWXAtQwC4GzgvyT5AYCqWvK+xJKGGACaUm2C4Ozmv/9owfJfYBAMHi+QVmIIaIqtGARVdfg4CpHmktcFaAasGARJ9gB+Ctg4vH1V/WZ3ZUkzzgDQDGkzNHQD8B3gfuDFbsuR5oAhoBnTJgg2VNUxnVcizToDQDOqzemjNyU5tfNKpFlmCGiGtekR3Al8JsnLgOeBAFVV+3VamTQrPCNIM65NEHwEOBG4v8rfWuklDAHNgTZB8BfAA4aAtIBTRGhOtAmCJ4Bbk9wEPLdzoaePqtd2JwQMAE2pNkHwaPPYu3lI/WUvQHOozZXF7x9HIdLUsxegOdXmyuL1wL8Afgh4+c7lVfUTHdYlTRdDQHOszXUEvw98BTgceD/wVeCuDmuSposhoDnXJggOrKrfAZ6vqtuq6heAN3ZclzQdDAH1QJuDxc83/30iyU8BjwMbuitJmgIeFFaPtAmCf5XklcA/Az4O7Af8cqdVSZNkCKhn2gTBX1fVM8AzwFsAkvzdTquSJsEAUE+1OUbw8ZbLpNllCKjHluwRJDkReBOwPsm7h1btB+zRdWHSWDhXkLTs0NDewD7NNvsOLX8WOKvLoqSxMAQkYJkgqKrbgNuSXF1VjwE0U1HvU1XPjqtAaeS8d4D0Em2OEXwoyX5JXgF8GXgoyT9f6UVJNid5MskDS6xPko8l2ZrkviSvX2Xt0uqttRdgCGgOtQmCo5sewD8AbgQOBd7W4nVXA6cts/504IjmcRHw71u8p7R7EoeCpCW0CYK9kuzFIAg+W1XPAyv+VVTV7cDTy2xyJvDJGrgT2D/Jq9oULbWys/G3FyAtq00Q/DaD+YVeAdye5DAGB4zX6mDg60PPtzXLdpHkoiRbkmzZvn37CD5ac2sUjf9OBoB6YsUgqKqPVdXBVXVG8+39MZoLy9Zosb/URf/yquqqqtpUVZvWr18/go/WXBpF4w/2AtQ7KwZBkoOS/E5zhzKSHA38/Ag+extwyNDzDQzmMZJWb5QhIPVMm6Ghq4GbgVc3zx9mNHMNXQ+c35w99Ebgmap6YgTvq74Z1TCQIaCeajPX0Lqq+oMkvwJQVTuSvLDSi5JcA5wErEuyDXgfsFfzHlcyOAPpDGAr8G3ggt3aA/WbxwKkNWsTBN9KciDN+P3Ob+8rvaiqzl1hfQHvalOktAsDQBqZNkHwbgbDOK9N8t+B9TjFhCbFq4KlkWtz8/r/meTNwJEMzvR5qLmWQBoPG3+pU21uXr8Hg7H8jc32pyahqn6z49rUF6M640fSbmkzNHQD8B3gfuDFbstR73QdAvYGpBW1CYINVXVM55Wof7oMAQNAaq3NdQQ3JTm180rUH6OaAmIphoC0Km16BHcCn2nuRfA8gwPGVVX7dVqZ5sM4x/8NAGm3tAmCjwAnAvc35/5LKzMApJnRJgj+AnjAEFAr4woAfx2lkWkTBE8AtzaTzj23c6Gnj2oXowwBG3ppbNoEwaPNY+/mIX1PFz0AQ0AaqzZXFr9/HIVohnjGjzRXlgyCJJdX1cVJbmCRG8ZU1U93WpmmjwEgzaXlegTnAxcDHx5TLZpGDv1Ic2+5IPhLgKq6bUy1aJoYAFJvLBcE65O8e6mVnjU0w8Y9yZsBIE215YJgD2AfFr/JvGbNpGb4NASkqbdcEDxRVR8YWyUanUlP62zjL82U5YLAnsAsmHSjv5ONvzSzlguCk8dWhXbPNISAASDNvCWDoKqeHmchmiE2/tJcaTPFhKbRJHoDBoA0lwwCDdjIS71lEPSZjb8kDIL+sNGXtASDYF7Z8Etqqc3N6zVtVjpQbAhIWgWDQJJ6ziCQpJ4zCCSp5wyCWTMN00pImisGwbzxQLGkVTIIZom9AUkdMAgkqecMAknquU6DIMlpSR5KsjXJZYusPyDJZ5Lcl+TPkvxwl/XMrKTdsJDHByTths6CIMkewG8BpwNHA+cmOXrBZr8K3FNVxwDnA/+uq3pmlscFJHWsyx7BCcDWqnqkqr4LXAucuWCbo4FbAKrqK8DGJAd1WNP8sjcgaTd1GQQHA18fer6tWTbsXuBnAZKcABwGbOiwJknSAl0GwWJjGgu/tv4GcECSe4BLgC8BO3Z5o+SiJFuSbNm+ffvoK5WkHutyGuptwCFDzzcAjw9vUFXPAhcAJAnwaPNgwXZXAVcBbNq0qT9jIG2PDzgsJGkNuuwR3AUckeTwJHsD5wDXD2+QZP9mHcA7gNubcFBbhoCkNeqsR1BVO5JcDNwM7AFsrqoHk7yzWX8l8Drgk0leAL4MXNhVPXPJEJA0Ap3eoayqbgRuXLDsyqGf7wCO6LIGSdLyvLJ4Wnn9gKQxMQhmlcNCkkbEIJhG9gYkjZFBIEk9ZxBMG3sDksbMIJhFHh+QNEIGwTSxNyBpAgyCaeF0EpImpNMLytSCvQBJE2aPYJJWGwL2BiR1wB7BJNgLkDRF7BGMmyEgacrYIxiXtQaAw0KSOmIQdM0AkDTlDIIujGr4xxCQNAYGwaiMcuzfAJA0RgbBWnRx4NcQkDRmBsHuMAAkzRGDoI2uT/k0BCRNkEGwlHGc728ASJoCBgGM/yIvA0DSFOlvENj4SxLQtyCw8ZekXfQnCMYVAjb+kmZMP4LAA7+StKT5D4KuQsCGX9KcmP8gGCUbf0lzyCBYio2+pJ4wCIbZ+EvqIYPAxl9Sz/X7VpWGgCT1PAgkST0IgqW+9dsbkCSgL8cIbPQlaUnz3yOQJC3LIJCknus0CJKcluShJFuTXLbI+lcmuSHJvUkeTHJBl/VIknbVWRAk2QP4LeB04Gjg3CRHL9jsXcCXq+pY4CTgI0n27qomSdKuuuwRnABsrapHquq7wLXAmQu2KWDfJAH2AZ4GdnRYkyRpgS6D4GDg60PPtzXLhl0OvA54HLgf+KWqenHhGyW5KMmWJFu2b9/eVb2S1EtdBsFi8z8vPI/zJ4F7gFcDxwGXJ9lvlxdVXVVVm6pq0/r160dfqST1WJdBsA04ZOj5Bgbf/IddAHy6BrYCjwJHjbySU04Z3Jdg5+OUU0b+EZI0q7oMgruAI5Ic3hwAPge4fsE2XwNOBkhyEHAk8MhIqzjlFLjllpcuu+UWw0CSGp1dWVxVO5JcDNwM7AFsrqoHk7yzWX8l8EHg6iT3MxhKek9VPTXSQhaGwErLJalnOp1ioqpuBG5csOzKoZ8fB07tsgZJ0vK8sliSem7+g+Dkk1e3XJJ6Zv6D4E/+ZNdG/+STB8slST2ZhtpGX5KWNP89AknSsgwCSeo5g0CSes4gkKSeMwgkqedSM3Zj9yTbgcd28+XrgNFOYTH93Od+cJ/7YS37fFhVLTp988wFwVok2VJVmyZdxzi5z/3gPvdDV/vs0JAk9ZxBIEk917cguGrSBUyA+9wP7nM/dLLPvTpGIEnaVd96BJKkBQwCSeq5uQyCJKcleSjJ1iSXLbI+ST7WrL8vyesnUecotdjn85p9vS/JF5McO4k6R2mlfR7a7keTvJDkrHHW14U2+5zkpCT3JHkwyW3jrnHUWvxuvzLJDUnubfb5gknUOSpJNid5MskDS6wffftVVXP1YHB/5L8EXgPsDdwLHL1gmzOAmxjcJ/mNwP+YdN1j2Oc3AQc0P5/eh30e2u5PGdwy9axJ1z2Gf+f9gS8DhzbP/+ak6x7DPv8q8K+bn9cDTwN7T7r2Nezz3wNeDzywxPqRt1/z2CM4AdhaVY9U1XeBa4EzF2xzJvDJGrgT2D/Jq8Zd6AituM9V9cWq+uvm6Z3AhjHXOGpt/p0BLgH+M/DkOIvrSJt9/ofAp6vqawBVNev73WafC9g3SYB9GATBjvGWOTpVdTuDfVjKyNuveQyCg4GvDz3f1ixb7TazZLX7cyGDbxSzbMV9TnIw8DPAlWOsq0tt/p3/NnBAkluT3J3k/LFV1402+3w58DrgceB+4Jeq6sXxlDcRI2+/5vEOZVlk2cJzZNtsM0ta70+StzAIgh/rtKLutdnnjwLvqaoXBl8WZ16bfd4TOB44Gfh+4I4kd1bVw10X15E2+/yTwD3ATwCvBf44yReq6tmui5uQkbdf8xgE24BDhp5vYPBNYbXbzJJW+5PkGOATwOlV9Vdjqq0rbfZ5E3BtEwLrgDOS7KiqPxpPiSPX9nf7qar6FvCtJLcDxwKzGgRt9vkC4DdqMIC+NcmjwFHAn42nxLEbefs1j0NDdwFHJDk8yd7AOcD1C7a5Hji/Ofr+RuCZqnpi3IWO0Ir7nORQ4NPA22b42+GwFfe5qg6vqo1VtRH4Q+CfzHAIQLvf7c8CP55kzyR/A3gD8OdjrnOU2uzz1xj0gEhyEHAk8MhYqxyvkbdfc9cjqKodSS4GbmZwxsHmqnowyTub9VcyOIPkDGAr8G0G3yhmVst9fi9wIHBF8w15R83wzI0t93mutNnnqvrzJJ8D7gNeBD5RVYuehjgLWv47fxC4Osn9DIZN3lNVMzs9dZJrgJOAdUm2Ae8D9oLu2i+nmJCknpvHoSFJ0ioYBJLUcwaBJPWcQSBJPWcQSNIUW2kSugXb/ttmwsF7kjyc5BttPsMg0NxK8mvNbJT3NX8Yb1jl69+e5NWrfM3Gxf5gh5cnOS7JGat5X/Xa1cBpbTasqn9aVcdV1XHAxxlcO7Qig0BzKcmJwN8HXl9VxwCn8NL5WVZ6/R7A24FVBUFLxzE4D1xa0WKT0CV5bZLPNfNJfSHJUYu89FzgmjafYRBoXr2KwVQLzwFU1VNV9ThAkpOTfCnJ/U23+/ua5V9N8t4k/43BH9Em4Peb3sT3Jzk+yW3NH9/NO2d8bJbfm+QO4F3LFdVcHfsB4Ozmfc9O8oqmjruaus5stn17kj/KYK79R5NcnOTdzTZ3JvmBjv7fafpdBVxSVccDlwJXDK9MchhwOIMp2FdkEGhe/VfgkGac9IokbwZI8nIGXe2zq+rvMLi6/h8Pve47VfVjVfV7wBbgvKabvYNBV/us5o9vM/DrzWv+A/CLVXXiSkU1Uym/F/hU04X/FPBrwJ9W1Y8CbwH+TZJXNC/5YQZTS5/QfN63q+pHgDuAWZ9ZVLshyT4M7i9yXZJ7gN9m8MVn2DnAH1bVC23ec+6mmJAAqur/JDke+HEGjeunMri71ZeAR4fmW/qPDL7Ff7R5/qkl3vJIBo3yHzdTdOwBPJHklcD+VbXzTmC/y+DGP6txKvDTSS5tnr8cOLT5+fNV9U3gm0meAW5olt8PHLPKz9F8eBnwjeYLylLOYYXe6TCDQHOr+TZ0K3BrMw/NzzOYrng531pieYAHF37rT7I/a5/CPMBbq+qhBe/9BuC5oUUvDj1/Ef9+e6mqnm2GCn+uqq7L4JvJMVV1L0CSI4EDGPQaW3FoSHMpyZFJjhhadBzwGPAVYGOSv9Usfxuw1H19vwns2/z8ELC+OQhNkr2S/FBVfQN4JsnO+zuc16K84feFwYRqlzR/0CT5kRbvoZ5oJqG7AzgyybYkFzL4Pbswyb3Ag7z0rm3nAtfWKiaS8xuF5tU+wMebb+w7GMzUeFFVfSeDm5tfl2RPBtMcLzVT6dXAlUn+L3AicBbwsWY4aE8Gw0kPMpj9cXOSbzNo1FfyeeCyZnz3Qwxmz/wocF8TBl9lcMaTRFWdu8SqRU8prap/udrPcPZRSeo5h4YkqecMAknqOYNAknrOIJCknjMIJKnnDAJJ6jmDQJJ67v8BxAzc0blzHTYAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "import matplotlib.pyplot as pyplot\n",
    "\n",
    "# Clone the list to avoid changing the ordering of the original data\n",
    "timestamp_sorted = list(URM_all_dataframe[\"Timestamp\"].values)\n",
    "timestamp_sorted.sort()\n",
    "\n",
    "\n",
    "pyplot.plot(timestamp_sorted, 'ro')\n",
    "pyplot.ylabel('Timestamp ')\n",
    "pyplot.xlabel('Sorted Item')\n",
    "pyplot.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### To store the data we use a sparse matrix. We build it as a COO matrix and then change its format\n",
    "\n",
    "#### The COO constructor expects (data, (row, column))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<69878x10677 sparse matrix of type '<class 'numpy.float64'>'\n",
       "\twith 10000054 stored elements in COOrdinate format>"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import scipy.sparse as sps\n",
    "\n",
    "URM_all = sps.coo_matrix((URM_all_dataframe[\"Interaction\"].values, \n",
    "                          (URM_all_dataframe[\"UserID\"].values, URM_all_dataframe[\"ItemID\"].values)))\n",
    "\n",
    "URM_all"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<69878x10677 sparse matrix of type '<class 'numpy.float64'>'\n",
       "\twith 10000054 stored elements in Compressed Sparse Row format>"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "URM_all.tocsr()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### We compute the item popularity as the number of interaction in each column\n",
    "\n",
    "### We can use the properties of sparse matrices in CSC format"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 2412, 14975, 17851, ...,     1,     1,     1], dtype=int32)"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "item_popularity = np.ediff1d(URM_all.tocsc().indptr)\n",
    "item_popularity"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([    1,     1,     1, ..., 33668, 34457, 34864], dtype=int32)"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "item_popularity = np.sort(item_popularity)\n",
    "item_popularity"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZIAAAEGCAYAAABPdROvAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3df/RVdZ3v8eeLHyKaiOhXFwIKJqs1aCuU7xBOrXtNnGRcTdiMTRQluphLqZX9WPemOavsdlsrp1vebAaL0kSlhMxJcmHloFndy4BfGhRRSeYC+hWuYJqSJgq87x/7c2Tz9XzP9/w+33O+r8dae5193nt/9v58BM+bz96f/dmKCMzMzKo1rNUVMDOz9uZEYmZmNXEiMTOzmjiRmJlZTZxIzMysJiNaXYFmO+6442Ly5MmtroaZWVtZv379sxHRVWzbkEskkydPpqenp9XVMDNrK5K297fNl7bMzKwmTiRmZlYTJxIzM6uJE4mZmdWkYYlE0uGS1kl6SNImSV9K8WskPS1pQ1rOz5W5StIWSZslnZeLz5C0MW27XpJSfJSk5Sm+VtLkRrXHzMyKa2SPZC9wTkS8DZgOzJE0K227LiKmp2UVgKRpwDzgNGAOsFjS8LT/DcAiYGpa5qT4QuD5iDgVuA64toHtMTNrT8uWweTJMGxY9rlsWV0P37BEEpk/pq8j01JqquG5wO0RsTcitgJbgJmSxgNjImJNZFMV3wJckCuzNK3fAcwu9FbMzIwsaSxaBNu3Q0T2uWhRXZNJQ++RSBouaQOwC7g3ItamTR+X9LCkmyQdk2ITgKdyxXtTbEJa7xs/pExE7ANeAI4tUo9Fknok9ezevbtOrTMzawNXXw0vv3xo7OWXs3idNDSRRMT+iJgOTCTrXZxOdpnqzWSXu3YCX0+7F+tJRIl4qTJ967EkIrojorurq+iDmWZmnenJJyuLV6Epo7Yi4g/AL4E5EfFMSjAHgO8CM9NuvcCkXLGJwI4Un1gkfkgZSSOAo4HnGtQMM7P2M25cZfEqNHLUVpeksWl9NHAu8Hi651HwPuCRtL4SmJdGYk0hu6m+LiJ2AnskzUr3Py4C7sqVWZDWLwTuC7/y0cysqRo519Z4YGkaeTUMWBERd0u6VdJ0sktQ24CPAkTEJkkrgEeBfcDlEbE/HetS4GZgNHBPWgBuBG6VtIWsJzKvge0xM2s/z/Vzkaa/eBU01P4B393dHZ600cyGjMmTs5FafZ18MmzbVvZhJK2PiO5i2/xku5lZJzv//MriVXAiMTPrZKtWVRavghOJmVkn65Thv2Zm1iInnVRZvApOJGZmnewrX4Ejjjg0dsQRWbxOnEjMzDrZ/PmwYAEMT3PgDh+efZ8/v26ncCIxM+tky5bB0qWwPz2Wt39/9r1dJm00M7MWa/dJG83MrMU8asvMzGriUVtmZlYTj9oyM7OazJ8PS5Zkc2tJ2eeSJXUdtdXI2X/NzGwwmD+/romjL/dIzMw63bJl2SzAw4Zln3Uc+gvukZiZdbZly2DRooNDgLdvz75D3Xop7pGYmXUyP0diZmY18XMkZmZWEz9HYmZmNfFzJGZmVpMmPEfSsEQi6XBJ6yQ9JGmTpC+l+DhJ90p6In0ekytzlaQtkjZLOi8XnyFpY9p2vSSl+ChJy1N8raTJjWqPmVnbmj8ftm2DAweyzzo/U9LIHsle4JyIeBswHZgjaRZwJbA6IqYCq9N3JE0D5gGnAXOAxZLSBPrcACwCpqZlToovBJ6PiFOB64BrG9geMzMromGJJDJ/TF9HpiWAucDSFF8KXJDW5wK3R8TeiNgKbAFmShoPjImINRERwC19yhSOdQcwu9BbMTOz5mjoPRJJwyVtAHYB90bEWuCEiNgJkD6PT7tPAJ7KFe9NsQlpvW/8kDIRsQ94ATi2SD0WSeqR1LN79+56Nc/MzGhwIomI/RExHZhI1rs4vcTuxXoSUSJeqkzfeiyJiO6I6O7q6hqo2mZmVoGmjNqKiD8AvyS7t/FMulxF+tyVdusFJuWKTQR2pPjEIvFDykgaARwNPNeQRpiZtasGz7XVyFFbXZLGpvXRwLnA48BKYEHabQFwV1pfCcxLI7GmkN1UX5cuf+2RNCvd/7ioT5nCsS4E7kv3UczMDODcc+HDH87m2Io4ONdWHZNJIydtHA8sTSOvhgErIuJuSWuAFZIWAk8C7weIiE2SVgCPAvuAyyMiva2eS4GbgdHAPWkBuBG4VdIWsp7IvAa2x8ysvVx2Gaxe/cZ4Ya6tOg0D1lD7B3x3d3f09PS0uhpmZo03YgTs3198m5Q9V1ImSesjorvYNj/ZbmbWqfpLIuC5tszMrEaea8vMzGrSDnNtmZlZi518cmXxKjmRmJl1qiZMIQ9OJGZmnasJU8hDY58jMTOzVps/v+6Joy/3SMzMrCZOJGZmneyyy7IHE6Xs87LL6n4KX9oyM+tUl10GN9xw8Pv+/Qe/L15ct9O4R2Jm1qmWLKksXiUnEjOzTtXfFCmlpk6pghOJmVmnGtbPT3x/8WpPU9ejmZnZ4DF6dGXxKjmRmJl1qpdeqixeJScSM7NO5UtbZmZWk/5eXFXBC63K4URiZmY1cSIxM7OaNCyRSJok6X5Jj0naJOmKFL9G0tOSNqTl/FyZqyRtkbRZ0nm5+AxJG9O26yUpxUdJWp7iayVNblR7zMzaTvZTWX68So3skewDPhsRfwbMAi6XNC1tuy4ipqdlFUDaNg84DZgDLJY0PO1/A7AImJqWOSm+EHg+Ik4FrgOubWB7zMzaS0Rl8So1LJFExM6I+G1a3wM8BkwoUWQucHtE7I2IrcAWYKak8cCYiFgTEQHcAlyQK7M0rd8BzC70VszMhrzhwyuLV6kp90jSJaczgLUp9HFJD0u6SdIxKTYBeCpXrDfFJqT1vvFDykTEPuAF4Ngi518kqUdSz+7du+vSJjOzQa9TpkiR9Cbgx8CnIuJFsstUbwamAzuBrxd2LVI8SsRLlTk0ELEkIrojorurq6vCFpiZtalOeGe7pJFkSWRZRNwJEBHPRMT+iDgAfBeYmXbvBSblik8EdqT4xCLxQ8pIGgEcDTzXmNaYmbWZ88+vLF6lAROJpDdLGpXWz5b0SUljyygn4EbgsYj4Ri4+Prfb+4BH0vpKYF4aiTWF7Kb6uojYCeyRNCsd8yLgrlyZBWn9QuC+dB/FzMxWraosXqVyXmz1Y6Bb0qlkiWEl8ANgoJT2DuAjwEZJG1Ls88AHJU0nuwS1DfgoQERskrQCeJRsxNflEVG4kHcpcDMwGrgnLaT63CppC1lPZF4Z7TEzGxq2b68sXiUN9A94Sb+NiDMl/VfglYj4lqR/j4gz6lqTJunu7o6enp5WV8PMrPGGDSs+1FeqeJoUSesjorvoacoo/5qkD5JdQro7xUZWVAMzM2u+QfQcySXAWcBXImJrun9xW11rYWZmbWvAeyQR8Sjwydz3rcBXG1kpMzNrHwMmEknvAK4BTk77C4iIOKWxVTMzs3ZQzqitG4FPA+uB+j4OaWZmba+cRPJCRNwz8G5mZjYUlZNI7pf0NeBOYG8hWJiQ0czMBqkjjyz+fvYjj6zracpJJG9Pn/nxwwGcU9eamJlZfR1+ePFEcvjhdT1NOaO23lXXM5qZWXM818/Ug/3Fq1TOXFtHS/pGYRp2SV+XdHRda2FmZvU3blxl8SqV80DiTcAe4O/S8iLw/brWwszM6u+VVyqLV6mceyRvjoi/zX3/Um4SRjMzG6yK3R8pFa9SOT2SP0l6Z+FLekDxT3WthZmZta1yeiSXAkvTfRGRTdd+cSMrZWZmdTBYhv9GxAbgbZLGpO8v1rUGZmbW1vpNJJI+HBG3SfpMnzgA+bcempnZINSkeySleiSFvs9RRbb5dbZmZgaUSCQR8Z20+q8R8b/z29INdzMzs7JGbX2rzJiZmQ0m6VZE2fEq9ZtIJJ0l6bNAl6TP5JZrgOEDHVjSJEn3S3pM0iZJV6T4OEn3SnoifR6TK3OVpC2SNks6LxefIWlj2na90o0aSaMkLU/xtZImV/1fwsys03zsY5XFq1SqR3IY8Cayy19H5ZYXgQvLOPY+4LMR8WfALOBySdOAK4HVETEVWJ2+k7bNA04D5gCLJRUS1g3AImBqWuak+ELg+Yg4FbgOuLaMepmZDQ2LF8Ps2YfGZs/O4nVU6h7JA8ADkm6OiO2VHjgidgI70/oeSY8BE4C5wNlpt6XAL4HPpfjtEbEX2CppCzBT0jZgTESsAZB0C3ABcE8qc0061h3AP0lSRJ3fbG9m1o6WLYNf//rQ2K9/ncXnz6/bacq5R/I9SWMLXyQdI+nnlZwkXXI6A1gLnJCSTCHZHJ92mwA8lSvWm2IT0nrf+CFlImIf8AJwbCV1MzPrWFdcAa++emjs1VezeB2Vk0iOi4g/FL5ExPMc/PEfkKQ3AT8GPjXAw4zF7v5EiXipMn3rsKgwe/Hu3bsHqrKZWWf4/e8ri1epnERyQNJJhS+STqbM50gkjSRLIssi4s4UfkbS+LR9PLArxXuBSbniE4EdKT6xSPyQMpJGAEeTTeFyiIhYEhHdEdHd1dVVTtXNzKxM5SSSq4HfSLpV0q3Ar4CrBiqURlbdCDzW5yn4lcCCtL4AuCsXn5dGYk0hu6m+Ll3+2iNpVjrmRX3KFI51IXCf74+YmTVXOXNt/UzSmWQjrwR8OiKeLePY7wA+AmzMTTv/eeCrwApJC4Engfen82yStAJ4lGzE1+URsT+VuxS4GRhNdpP9nhS/Ebg13Zh/jmzUl5mZNVE5s/8C7Ce7BHU4ME0SEfGrUgUi4jcUv4cBMLtYMCK+AnylSLwHOL1I/BVSIjIzs5zLLmvaqQZMJJL+HriC7N7EBrKeyRrgnMZWzczMqrZkSdNOVc49kiuAPwe2R8S7yIbxeuiTmdlgtn//wPvUSTmJ5JV0CQlJoyLiceAtja2WmZk1TJ3n2irnHklveiDxJ8C9kp7n4PBbMzNrN3Wea6ucUVvvS6vXSLqf7FmNn9W1FmZm1jzNmmsLQNIw4OGIOB1en3/LzMzsdSXvkUTEAeCh/JPtZmZmeeXcIxkPbJK0Dnj9Rb8R8d6G1crMzNpGOYnkSw2vhZmZta1ybrY/kCZqnBoR/yrpCMp4Q6KZmQ0NAz5HIum/kL006jspNIFsKLCZmVlZDyReTjYB44sAEfEEFbyPxMzMOls5iWRvRLz+iq303g9P1W5mZkB5ieQBSZ8HRkv6S+BHwE8bWy0zM2sX5SSSK8kmadwIfBRYFRFXN7RWZmbWNsoZ/vuJiPgm8N1CQNIVKWZmZkNcOT2SBUViF9e5HmZm1qb67ZFI+iDwIWCKpJW5TUcBv290xczMrD2UurT1f4CdwHHA13PxPcDDjayUmZm1j34TSURsB7YDZzWvOmZm1m7KebL9byQ9IekFSS9K2iPpxTLK3SRpl6RHcrFrJD0taUNazs9tu0rSFkmbJZ2Xi8+QtDFtu17KXu0laZSk5Sm+VtLkShtvZma1K+dm+z8C742IoyNiTEQcFRFjyih3MzCnSPy6iJiellUAkqYB84DTUpnFkgrzed0ALAKmpqVwzIXA8xFxKnAdcG0ZdTIz63zLljX1dOUkkmci4rFKDxwRvwKeK3P3ucDtEbE3IrYCW4CZksYDYyJiTUQEcAtwQa7M0rR+BzC70FsxMxvSrriiqacr5zmSHknLySZq3FsIRsSdVZ7z45IuAnqAz0bE82QTQf5bbp/eFHstrfeNkz6fSnXZJ+kF4Fjg2b4nlLSIrFfDSSf5HV1m1uF+39yBteX0SMYALwPvBv46Le+p8nw3AG8GppONCCuMBivWk4gS8VJl3hiMWBIR3RHR3dXVVVmNzcw6ybHH1v2Q5byP5JJ6nSwinimsS/oucHf62gtMyu06EdiR4hOLxPNletNEkkdT/qU0M7Oh6Zv1n5Sk1AOJ36LELL8R8clKTyZpfETsTF/fBxRGdK0EfiDpG8CJZDfV10XE/jRKbBawFrgI+FauzAJgDXAhcF+6j2JmZv2ZP7/uhyzVI+mp5cCSfgicDRwnqRf4InC2pOlkCWob2SSQRMQmSSuAR4F9wOURsT8d6lKyEWCjgXvSAnAjcKukLWQ9kXm11NfMzKqjofaP+O7u7ujpqSlHmpkNbqUGsFb5my9pfUR0F9tWzs12MzOzfjmRmJlZTZxIzMysJgMO/5U0BfgEMDm/f0S8t3HVMjOzdlHOk+0/IRsh9VPgQGOrY2Zm7aacRPJKRFzf8JqYmVlbKieRfFPSF4FfcOhcW79tWK3MzKw6TZ75F8pLJG8FPgKcw8FLW5G+m5nZYPKxjzX9lOUkkvcBp0TEq42ujJmZ1eiPf2z6KcsZ/vsQMLbRFTEzswY78siGHLacHskJwOOSHuTQeyQe/mtm1k6+852GHLacRPLFhpzZzMyaqwEz/0J57yN5oCFnNjOzjlDOk+17OPheksOAkcBLETGmkRUzM7P2UE6P5Kj8d0kXADMbViMzM2srFU/aGBE/wc+QmJkNPocd1pLTlnNp629yX4cB3ZR4Ba+ZmbXIa6+15LTljNr669z6PrJX5M5tSG3MzKztlHOP5JJmVMTMzBpo2rSGHbrfRCLpCyXKRUR8udSBJd0EvAfYFRGnp9g4YDnZu022AX8XEc+nbVcBC4H9wCcj4ucpPgO4GRgNrAKuiIiQNAq4BZgB/B74QERsK91cM7MhatOmhh261M32l4oskP3Yf66MY98MzOkTuxJYHRFTgdXpO5KmAfOA01KZxZKGpzI3AIuAqWkpHHMh8HxEnApcB1xbRp3MzKzO+k0kEfH1wgIsIesRXALcDpwy0IEj4lfAc33Cc4GlaX0pcEEufntE7I2IrcAWYKak8cCYiFgTEUHWA7mgyLHuAGZL0kD1MjOz+io5/FfSOEn/A3iY7DLYmRHxuYjYVeX5ToiInQDp8/gUnwA8lduvN8UmpPW+8UPKRMQ+4AXg2H7asUhSj6Se3bt3V1l1M7NBrIX/ju43kUj6GvAgsAd4a0RcU7if0QDF/gtEiXipMm8MRiyJiO6I6O7q6qqyimZmVkypHslngROBfwB2SHoxLXskvVjl+Z5Jl6tIn4WeTS8wKbffRGBHik8sEj+kjKQRwNG88VKamZk1WKl7JMMiYnREHBURY3LLUTXMs7USWJDWFwB35eLzJI2SNIXspvq6dPlrj6RZ6f7HRX3KFI51IXBfuo9iZmZ5o0c39PDlPJBYFUk/BM4GjpPUSzYd/VeBFZIWAk8C7weIiE2SVgCPkj30eHlE7E+HupSDw3/vSQvAjcCtkraQ9UTmNaotZmZt7eWXG3p4DbV/xHd3d0dPT0+rq2FmVj9HHAF/+lP/2+vwOy9pfUR0F9tW8aSNZmY2yJRKIk3gRGJmZjVxIjEzs5o4kZiZtbMJE0pvb+BkjQVOJGZm7WzHjtLbGzhZY4ETiZmZ1cSJxMysXQ2SeWqdSMzMOlWDn2gvcCIxM+tUDX6ivcCJxMysHQ2Sy1rgRGJm1pnGjm3aqZxIzMw60fONen3UGzmRmJm1m+HDW12DQziRmJm1mwMHSm9v8qzuTiRmZlYTJxIzs3ZyxBGtrsEbOJGYmbWTgd49cumlzalHjhOJmVknWby46ad0IjEzs5q0JJFI2iZpo6QNknpSbJykeyU9kT6Pye1/laQtkjZLOi8Xn5GOs0XS9dIgetTTzKzeBulPXCt7JO+KiOm5l8lfCayOiKnA6vQdSdOAecBpwBxgsaTCIOobgEXA1LTMaWL9zcya57DDBt6nycN+CwbTpa25wNK0vhS4IBe/PSL2RsRWYAswU9J4YExErImIAG7JlTEz6yyvvdbqGvSrVYkkgF9IWi9pUYqdEBE7AdLn8Sk+AXgqV7Y3xSak9b5xM7POMtDrdAFGjmx8PfoxokXnfUdE7JB0PHCvpMdL7FvsomCUiL/xAFmyWgRw0kknVVpXM7PWGuh1ugCvvtr4evSjJT2SiNiRPncB/wLMBJ5Jl6tIn7vS7r3ApFzxicCOFJ9YJF7sfEsiojsiuru6uurZFDOzxhqkN9jzmp5IJB0p6ajCOvBu4BFgJbAg7bYAuCutrwTmSRolaQrZTfV16fLXHkmz0miti3JlzMzaX7lJpEU32QtacWnrBOBf0kjdEcAPIuJnkh4EVkhaCDwJvB8gIjZJWgE8CuwDLo+I/elYlwI3A6OBe9JiZtb+2qAnUqBocSZrtu7u7ujp6Wl1NczM+ldJEmnSb7ik9bnHNQ4xmIb/mpkNbcuWVZZEbrutcXWpQKtGbZmZWV6ll7JOPBHmz29MXSrkHomZWatVmkRGjoSnn25MXargRGJm1krV3FRv4TMjxfjSlplZK1Q7KmsQDpByIjEza6ZahvUOwiQCvrRlZtYcUvVJZNq0QZtEwD0SM7PGqvXBwkGcQArcIzEzq7dzz62tB1LQBkkE3CMxM6ufek5r0iZJBNwjMTOr3jHHHOx51CuJzJ7dVkkE3CMxM6tMIydTbLMEUuBEYmZWSjNm4W3TBFLgRGJmVtDsqdvbPIEUOJGY2dDT6nd9dEgCKXAiMbPO0+pEUUyHJY88j9oys/aRHyFVahksxo7NEkgHJxFwj8TMmm0w/dA3QocnjWLcIzGzTLn/2q916TS33Xaw1zEEkwi4R2JD3WmnwaOPtroW1m6GaMLoT9snEklzgG8Cw4HvRcRX636Sww6D116r+2HNbJAbOXLQvURqMGrrS1uShgP/DPwVMA34oKRpdT2Jk4jZ0JC/PFVYnETK0taJBJgJbImI/xsRrwK3A3PregYnEbPOUCxRDPF7G/XS7olkAvBU7ntvih1C0iJJPZJ6du/e3bTKmVkTDJQgnCgart0TSbEhIG/4GxMRSyKiOyK6u7q6mlAtM6tIucnACWJQaveb7b3ApNz3icCOup5h5Ehf3rKhyz/UVoZ2TyQPAlMlTQGeBuYBH6rrGV591Tfchxr/eJpVpK0TSUTsk/Rx4Odkw39viohNdT+RR26YmfWrrRMJQESsAla1uh5mZkNVu99sNzOzFnMiMTOzmjiRmJlZTZxIzMysJoohNtRR0m5ge5XFjwOerWN1BiO3sTO4jZ1hMLXx5Igo+kT3kEsktZDUExHdra5HI7mNncFt7Azt0kZf2jIzs5o4kZiZWU2cSCqzpNUVaAK3sTO4jZ2hLdroeyRmZlYT90jMzKwmTiRmZlYTJ5IySZojabOkLZKubHV9yiVpkqT7JT0maZOkK1J8nKR7JT2RPo/JlbkqtXOzpPNy8RmSNqZt10sq9mKxlpE0XNK/S7o7fe+oNkoaK+kOSY+nP8+zOrCNn05/Tx+R9ENJh3dCGyXdJGmXpEdysbq1S9IoSctTfK2kyc1sHxHhZYCFbIr6/wBOAQ4DHgKmtbpeZdZ9PHBmWj8K+B0wDfhH4MoUvxK4Nq1PS+0bBUxJ7R6etq0DziJ7M+U9wF+1un192voZ4AfA3el7R7URWAr8fVo/DBjbSW0ke032VmB0+r4CuLgT2gj8J+BM4JFcrG7tAi4Dvp3W5wHLm9q+Vv/laYcl/cH9PPf9KuCqVteryrbcBfwlsBkYn2Ljgc3F2kb2rpez0j6P5+IfBL7T6vbk6jMRWA2ck0skHdNGYEz6kVWfeCe1cQLwFDCO7BUXdwPv7pQ2ApP7JJK6tauwT1ofQfY0vBrVlr6LL22Vp/AXvKA3xdpK6u6eAawFToiInQDp8/i0W39tnZDW+8YHi/8F/DfgQC7WSW08BdgNfD9dvvuepCPpoDZGxNPA/wSeBHYCL0TEL+igNvZRz3a9XiYi9gEvAMc2rOZ9OJGUp9j11bYaNy3pTcCPgU9FxIuldi0SixLxlpP0HmBXRKwvt0iR2KBuI9m/Ms8EboiIM4CXyC6H9Kft2pjuEcwlu5xzInCkpA+XKlIkNqjbWKZq2tXSNjuRlKcXmJT7PhHY0aK6VEzSSLIksiwi7kzhZySNT9vHA7tSvL+29qb1vvHB4B3AeyVtA24HzpF0G53Vxl6gNyLWpu93kCWWTmrjucDWiNgdEa8BdwJ/QWe1Ma+e7Xq9jKQRwNHAcw2reR9OJOV5EJgqaYqkw8huZq1scZ3KkkZ13Ag8FhHfyG1aCSxI6wvI7p0U4vPSKJApwFRgXep675E0Kx3zolyZloqIqyJiYkRMJvuzuS8iPkxntfH/AU9JeksKzQYepYPaSHZJa5akI1LdZgOP0VltzKtnu/LHupDs/4Hm9cJafQOqXRbgfLIRT/8BXN3q+lRQ73eSdXEfBjak5Xyy66ergSfS57hcmatTOzeTG+0CdAOPpG3/RBNv5lXQ3rM5eLO9o9oITAd60p/lT4BjOrCNXwIeT/W7lWzkUtu3Efgh2X2f18h6Dwvr2S7gcOBHwBaykV2nNLN9niLFzMxq4ktbZmZWEycSMzOriROJmZnVxInEzMxq4kRiZmY1cSIx64ekq9NMtA9L2iDp7RWWv1jSiRWWmZyfIbZYXNJ0SedXclyzRhrR6gqYDUaSzgLeQzZz8l5Jx5HNuFtu+eFkM9c+Qv2fqp5O9jzBqjof16wq7pGYFTceeDYi9gJExLMRsQNA0uw0ceLG9J6JUSm+TdIXJP2GbGbWbmBZ6s2MTu+SeEDSekk/z02PMUPSQ5LWAJeXqlSaWeG/Ax9Ix/2ApCNTPR5M9Zqb9r1Y0k8k/VTSVkkfl/SZtM+/SRrXoP92NsQ4kZgV9wtgkqTfSVos6T8DSDocuBn4QES8laxXf2mu3CsR8c6IuI3sKfT5ETEd2Ad8C7gwImYANwFfSWW+D3wyIs4aqFIR8SrwBbL3TUyPiOVkT0HfFxF/DrwL+FqaGRjgdOBDwMx0vpcjm/RxDdkUG2Y1cyIxKyIi/gjMABaRTd++XNLFwFvIJhb8Xdp1KdlLiwqW93PIt5D9qN8raQPwD8BESUcDYyPigbTfrVVU993Alem4vySbLuOktO3+iNgTEbvJphb/aYpvJHs/hlnNfI/ErB8RsZ/sh/mXkjaSTYq3YYBiL/UTF7Cpb69D0lhqn+5bwN9GxOY+x347sDcXOpD7fgD//2914h6JWRGS3iJpai40HdhONqHgZEmnpvhHgAf6lk/2kCLa5ycAAAC4SURBVL3eGLLJ97rSTXwkjZR0WkT8AXhB0jvTfvPLqF7+uJC9He8TaUZYJJ1RxjHM6saJxKy4NwFLJT0q6WGy92hfExGvAJcAP0q9lAPAt/s5xs3At9Mlp+Fk03tfK+khsp7NX6T9LgH+Od1s/1MZdbsfmFa42Q58GRgJPJyGCH+58uaaVc+z/5qZWU3cIzEzs5o4kZiZWU2cSMzMrCZOJGZmVhMnEjMzq4kTiZmZ1cSJxMzMavL/Afz8riQ3dqZkAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "pyplot.plot(item_popularity, 'ro')\n",
    "pyplot.ylabel('Num Interactions ')\n",
    "pyplot.xlabel('Sorted Item')\n",
    "pyplot.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Average per-item interactions over the whole dataset 936.60\n",
      "Average per-item interactions for the top 10% popular items 6479.52\n",
      "Average per-item interactions for the least 10% popular items 5.23\n",
      "Average per-item interactions for the median 10% popular items 136.45\n"
     ]
    }
   ],
   "source": [
    "ten_percent = int(n_items/10)\n",
    "\n",
    "print(\"Average per-item interactions over the whole dataset {:.2f}\".\n",
    "      format(item_popularity.mean()))\n",
    "\n",
    "print(\"Average per-item interactions for the top 10% popular items {:.2f}\".\n",
    "      format(item_popularity[-ten_percent:].mean()))\n",
    "\n",
    "print(\"Average per-item interactions for the least 10% popular items {:.2f}\".\n",
    "      format(item_popularity[:ten_percent].mean()))\n",
    "\n",
    "print(\"Average per-item interactions for the median 10% popular items {:.2f}\".\n",
    "      format(item_popularity[int(n_items*0.45):int(n_items*0.55)].mean()))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of items with zero interactions 0\n"
     ]
    }
   ],
   "source": [
    "print(\"Number of items with zero interactions {}\".\n",
    "      format(np.sum(item_popularity==0)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### We compute the user activity (profile length) as the number of interaction in each row\n",
    "\n",
    "### We can use the properties of sparse matrices in CSR format"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYwAAAEGCAYAAAB2EqL0AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAeo0lEQVR4nO3dfZQVd53n8feHhxCIIQmhk4NAgIxsdoiOMfQwycZ11ahBR0N0zG5HNJiJ2zOIGp05Z4Rh14czk3OyO6tr4gjakwc6m9YEH8O6RkXUeMxGSaMxhCQIDg/pBYFkNGAwJJDv/lG/HormdlP3ch+7P69z6lTV99av6tt94H676lf1K0UEZmZmJzKq0QmYmVlrcMEwM7NCXDDMzKwQFwwzMyvEBcPMzAoZ0+gEamXy5Mkxc+bMRqdhZtZSNmzY8FREtJX6bNgWjJkzZ9Lb29voNMzMWoqkHYN95ktSZmZWiAuGmZkV4oJhZmaFuGCYmVkhLhhmZlaIC4aZ2XDQ0wMzZ8KoUdm8p6fqhxi2t9WamY0YPT3Q2QkHD2brO3Zk6wALF1btMD7DMDNrdcuXHy0W/Q4ezOJV5IJhZtbqdu4sL14hFwwzs1Z33nnlxSvkgmFm1upuvBEmTDg2NmFCFq8iFwwzs1a3cCF0dcGMGSBl866uqnZ4g++SMjMbHhYurHqBGMhnGGZmVogLhpnZcFCHB/dqVjAkXSDp4dy0X9KHJU2StFbSljQ/K9dmmaStkjZLuiIXnytpY/rsFkmqVd5mZi2n/8G9HTsg4uiDe1UuGjUrGBGxOSIuioiLgLnAQeDrwFJgXUTMBtaldSTNATqAC4H5wApJo9PuVgKdwOw0za9V3mZmLWeYPbh3OfCriNgBLAC6U7wbuCotLwDujohDEbEN2ArMkzQFmBgRD0ZEAHfm2piZ2TB7cK8D+FJaPjcidgOk+TkpPhV4MtemL8WmpuWB8eNI6pTUK6l33759VUzfzKyJDZcH9ySdAlwJfPlEm5aIxRDx44MRXRHRHhHtbW0l32FuZjb8DKMH994M/Cwi9qT1PekyE2m+N8X7gOm5dtOAXSk+rUTczMygbg/u1aNgXMPRy1EAa4BFaXkRcG8u3iFpnKRZZJ3b69NlqwOSLkl3R12ba2NmZpAVh+3b4cUXs3kNHuKr6ZPekiYAbwT+Ihe+CVgt6XpgJ3A1QERskrQaeAw4DCyJiCOpzWJgFTAeuC9NZmZWR8puPBp+2tvbo7e3t9FpmJm1FEkbIqK91Gd+0tvMzApxwTAzs0JcMMzMrBAXDDMzK8QFw8zMCnHBMDOzQlwwzMysEBcMMzMrxAXDzMwKccEwM7NCXDDMzKwQFwwzMyvEBcPMzApxwTAzs0JcMMzMrBAXDDMzK8QFw8zMCnHBMDOzQmpaMCSdKekrkp6Q9LikSyVNkrRW0pY0Pyu3/TJJWyVtlnRFLj5X0sb02S2SVMu8zczseLU+w7gZ+HZE/FvglcDjwFJgXUTMBtaldSTNATqAC4H5wApJo9N+VgKdwOw0za9x3mZmNkDNCoakicBrgNsAIuL5iPgtsADoTpt1A1el5QXA3RFxKCK2AVuBeZKmABMj4sGICODOXBszM6uTWp5hnA/sA+6Q9HNJt0o6DTg3InYDpPk5afupwJO59n0pNjUtD4wfR1KnpF5Jvfv27avuT2NmNsLVsmCMAS4GVkbEq4BnSZefBlGqXyKGiB8fjOiKiPaIaG9rays3XzMzG0ItC0Yf0BcRP03rXyErIHvSZSbSfG9u++m59tOAXSk+rUTczMzqqGYFIyJ+DTwp6YIUuhx4DFgDLEqxRcC9aXkN0CFpnKRZZJ3b69NlqwOSLkl3R12ba2NmZnUypsb7/yDQI+kU4J+B68iK1GpJ1wM7gasBImKTpNVkReUwsCQijqT9LAZWAeOB+9JkZmZ1pOzGo+Gnvb09ent7G52GmVlLkbQhItpLfeYnvc3MrBAXDDMzK8QFw8zMCnHBMDOzQlwwzMysEBcMMzMrxAXDzMwKccEwM7NCXDDMzKwQFwwzMyvEBcPMzApxwTAzs0JcMMzMrBAXDDOz4aCnB2bOhFGjsnlPT9UPUev3YZiZWa319EBnJxw8mK3v2JGtAyxcWLXD+AzDzKzVLV9+tFj0O3gwi1fRCQuGpD+QNC4tv1bShySdWdUszMyscjt3lhevUJEzjK8CRyS9DLgNmAV8sapZmJlZ5c47r7x4hYoUjBcj4jDwduAzEfERYEqRnUvaLmmjpIcl9abYJElrJW1J87Ny2y+TtFXSZklX5OJz0362SrpFksr7Mc3MhrEbb4QJE46NTZiQxauoSMF4QdI1wCLgmyk2toxjvC4iLsq9I3YpsC4iZgPr0jqS5gAdwIXAfGCFpNGpzUqgE5idpvllHN/MbHhbuBC6umDGDJCyeVdXVTu8oVjBuA64FLgxIrZJmgXcdRLHXAB0p+Vu4Kpc/O6IOBQR24CtwDxJU4CJEfFgRARwZ66NmZlBVhy2b4cXX8zmVS4WUOC22oh4DPhQbn0bcFPB/QfwXUkBfCEiuoBzI2J32tduSeekbacCP8m17UuxF9LywPhxJHWSnYlwXpWv3ZmZjXQnLBiSLgM+AcxI2wuIiDi/wP4vi4hdqSislfTEUIcqEYsh4scHs4LUBdDe3l5yGzMzq0yRB/duAz4CbACOlLPziNiV5nslfR2YB+yRNCWdXUwB9qbN+4DpuebTgF0pPq1E3MzM6qhIH8YzEXFfROyNiKf7pxM1knSapNP7l4E3AY8Ca8g60Enze9PyGqBD0rjUTzIbWJ8uXx2QdEm6O+raXBszM6uTImcYP5D0D8DXgEP9wYj42QnanQt8Pd0BOwb4YkR8W9JDwGpJ1wM7gavT/jZJWg08BhwGlkRE/xnNYmAVMB64L01mZlZHym48GmID6QclwhERr69NStXR3t4evb29jU7DzKylSNqQewziGEXuknpd9VMyM7NWU2QsqTMkfVpSb5o+JemMeiRnZmbNo0in9+3AAeA/pmk/cEctkzIzs+ZTpNP7DyLiz3Lrn5T0cK0SMjOz5lTkDOP3kl7dv5Ie5Pt97VIyM7NmVKRgLAY+l0ae3QH8I/CXtU3LzMzK0gyvaI2Ih4FXSpqY1vdXPQszM6tcTw/8+Z/D889n6zt2ZOtQ1UEIB30OQ9K7I+IuSX9V6vOI+HTVsqgBP4dhZiPG5MnwdIkBOM4+G556qqxdVfocxmlpfnqJzzywn5lZsyhVLIaKV2jQghERX0iL34uIB/KfpY5vMzMbQYp0en+2YMzMzBrh7LPLi1do0DMMSZcC/w5oG9CPMREYXbqVmZnV3c03w3XXwQsvHI2NHZvFq2ioM4xTgJeQFZXTc9N+4J1VzcLMzCq3cCHcccex7/S+446qv6Z1qD6M+4H7Ja2KiB1VPaqZmVXXAw9AXx9EZPMHHqh6wSjSh3GrpDP7VySdJek7Vc3CzMwq9/73w8qVcCS9QujIkWz9/e+v6mGKFIzJEfHb/pWI+A1wTlWzMDOzynV1lRevUJGC8aKk8/pXJM3Az2GYmTWP/jOLovEKFRmtdjnwY0n3p/XXAJ1VzcLMzCo3enTp4jC6uje0nvAMIyK+DVwM3AOsBuZGROE+DEmjJf1c0jfT+iRJayVtSfOzctsuk7RV0mZJV+TicyVtTJ/dovSicDMzAzoH+Rt+sHiFilySAjgC7AWeAeZIek0Zx7gBeDy3vhRYFxGzgXVpHUlzgA7gQmA+sEJSf3lcSXZWMztN88s4vpnZ8LZiBSxefPSMYvTobH3FiqoepsgrWt8H/Aj4DvDJNP9EkZ1Lmgb8KXBrLrwA6E7L3cBVufjdEXEoIrYBW4F5kqYAEyPiwchGSrwz18bMzCArDocPZ7fVHj5c9WIBxc4wbgD+GNgREa8DXgXsK7j/zwB/A7yYi50bEbsB0rz/jqupwJO57fpSbGpaHhg/jqTO/neP79tXNEUzMyuiSMF4LiKeA5A0LiKeAC44USNJbwX2RsSGgrmU6peIIeLHByO6IqI9Itrb2toKHtbMzIoocpdUX3pw7xvAWkm/AXYVaHcZcKWktwCnAhMl3QXskTQlInany017+48DTM+1n5aO05eWB8bNzKyOitwl9faI+G1EfAL4r8BtFOhDiIhlETEtImaSdWZ/PyLeDawBFqXNFgH3puU1QIekcZJmkXVur0+XrQ5IuiTdHXVtro2ZmUHjX9EqaRTwSES8HP51fKmTdROwWtL1wE7g6rTvTZJWA48Bh4ElEdF/Y/FiYBUwHrgvTWZmBllx6OyEgwez9R07jt5SW49XtP7rBlIPsCwidlbtqHXgV7Sa2Ygxc2ZWJAaaMQO2by9rV5W+orXfFGCTpPXAs/3BiLiyrCzMzKw2ShWLoeIVKlIwPlnVI5qZWXXVaWiQExaMiLg/DTg4OyK+J2kCfuOemVnzqNPgg0We9P7PwFeAL6TQVLJbbM3MrBnMmFFevEJFHtxbQvZMxX6AiNiC34dhZtY8brwRJkw4NjZhQhavoiIF41BEPN+/ImkMfh+GmVnzWLgQFi06dvDBRYsa8orW+yX9LTBe0huBLwP/u6pZmJlZ5Xp6oLv72Fe0dndX/eG9IgVjKdlggxuBvwC+FRHLq5qFmZlVbvnyow/t9Tt4MItXUZHbaj8YETcD/9QfkHRDipmZWaPV6TmMImcYi0rE3lvVLMzMrHKDPW9Rr+cwJF0DvAuYJWlN7qPTgaermoWZmVWuTs9hDHVJ6v8Cu4HJwKdy8QPAI1XNwszMKjdjxuBjSVXRoAUjInYAO4BLq3pEMzOrrhtvhPe+N3s1a78xY+r/HIakd0jaIukZSfslHZC0v6pZmJlZ5R544NhiAdn6Aw9U9TBFhjffCrwtIh6v6pFrzMObm9mIMWoUlPoul+DFF8va1VDDmxe5S2pPqxULM7MRZbA//E9wQlCuIs9h9Eq6h2zAwUNH84ivVTUTMzNrakXOMCYCB4E3AW9L01tP1EjSqZLWS/qFpE2SPpnikyStTf0iayWdlWuzTNJWSZslXZGLz5W0MX12S3q3t5mZ1VGR92FcV+G+DwGvj4jfSRoL/FjSfcA7gHURcZOkpWRDj3xU0hygA7gQeCnwPUn/Jr3XeyXQCfwE+BYwH7/X28wsc/nlsG5d6XgVDfXg3mcZYlTaiPjQUDuOrDf9d2l1bJoCWAC8NsW7gR8CH03xuyPiELAtdbbPk7QdmBgRD6a87gSuwgXDzCyzdWt58QoNdYZx0rcYSRoNbABeBnwuIn4q6dyI2A0QEbsl9b9bYyrZGUS/vhR7IS0PjJuZGcDOneXFKzTUg3vdJ7vzdDnpIklnAl+X9PIhNi/VLxFDxI/fgdRJdumK8847r8xszcxa1KRJ8HSJEZsmTarqYYp0ep+0iPgt2aWn+cAeSVMA0nxv2qwPmJ5rNg3YleLTSsRLHacrItojor2tra2qP4OZWdN67rny4hWqWcGQ1JbOLJA0HngD8ASwhqMj4C4C7k3La4AOSeMkzQJmA+vT5asDki5Jd0ddm2tjZmbPPltevEJFnsOo1BSgO/VjjAJWR8Q3JT0IrJZ0PbATuBogIjZJWg08BhwGlqRLWgCLgVXAeLLObnd4m5nVWZGhQWYBHwRmkiswEXFlTTM7SR4axMxGjKEeTSvzae+hhgYpcobxDeA2svd4lzcoiZmZDRtFCsZzEXFLzTMxM7OmVqRg3Czp48B3OXYsqZ/VLCszM2s6RQrGK4D3AK/n6CWpSOtmZjZCFCkYbwfOj4jna52MmZk1ryLPYfwCOLPWiZiZWXMrcoZxLvCEpIc4tg+jqW+rNTOz6ipSMD5e8yzMzKzpFXkfxv31SMTMzJrbCQuGpAMcHR32FLL3WjwbERNrmZiZmTWXImcYp+fXJV0FzKtZRmZm1pTKHq02Ir6Bn8EwMxtxilySekdudRTQzhCvbjUzs+GpyF1Sb8stHwa2k71/28zMRpAifRjX1SMRMzNrboMWDEkfG6JdRMTf1SAfMzNrUkOdYZR6t99pwPXA2YALhpnZCDJowYiIT/UvSzoduAG4Drgb+NRg7czMbHga8rZaSZMk/T3wCFlxuTgiPhoRe0+0Y0nTJf1A0uOSNkm6IbfPtZK2pPlZuTbLJG2VtFnSFbn4XEkb02e3SEO9j9DMzGph0IIh6R+Ah4ADwCsi4hMR8Zsy9n0Y+OuI+EPgEmCJpDnAUmBdRMwG1qV10mcdwIXAfGCFpNFpXyuBTmB2muaXkYeZmVXBUGcYfw28FPgvwC5J+9N0QNL+E+04Inb3v5UvIg4AjwNTyW7J7U6bdQNXpeUFwN0RcSgitgFbgXmSpgATI+LBiAjgzlwbMzOrk6H6MMp+CnwwkmYCrwJ+CpwbEbvTMXZLOidtNhX4Sa5ZX4q9kJYHxs3MrI6qVhQGI+klwFeBD0fEUGcmpfolYoh4qWN1SuqV1Ltv377ykzUzs0HVtGBIGktWLHoi4mspvCddZiLN+zvQ+4DpuebTgF0pPq1E/DgR0RUR7RHR3tbWVr0fxMzMalcw0p1MtwGPR8Sncx+tARal5UXAvbl4h6RxkmaRdW6vT5evDki6JO3z2lwbMzOrkyJjSVXqMuA9wEZJD6fY3wI3AaslXQ/sBK4GiIhNklYDj5HdYbUkIo6kdouBVcB44L40mZlZHSm78Wj4aW9vj97e3kanYWZWe0M9mlbmd7ykDRHRXuqzmnd6m5nZ8OCCYWZmhbhgmJm1sje8oW6HcsEwM2tl69bV7VAuGGZmVogLhpmZFeKCYWZmhbhgmJkNV1V+zs4Fw8zMCnHBMDOzQlwwzMysEBcMMzMrxAXDzMwKccEwM2tVQ41SWwMuGGZmVogLhpmZFeKCYWZmhbhgmJm1ohP1XyxeXPVD1qxgSLpd0l5Jj+ZikyStlbQlzc/KfbZM0lZJmyVdkYvPlbQxfXaLVOdeHjOzVrRiRdV3WcszjFXA/AGxpcC6iJgNrEvrSJoDdAAXpjYrJI1ObVYCncDsNA3cp5mZ1UHNCkZE/Aj4lwHhBUB3Wu4GrsrF746IQxGxDdgKzJM0BZgYEQ9GRAB35tqYmY1MDbrQUu8+jHMjYjdAmp+T4lOBJ3Pb9aXY1LQ8MG5mZnXWLJ3epcplDBEvvROpU1KvpN59+/ZVLTkzs5ZS5WHN+9W7YOxJl5lI870p3gdMz203DdiV4tNKxEuKiK6IaI+I9ra2tqombmbWFBp430+9C8YaYFFaXgTcm4t3SBonaRZZ5/b6dNnqgKRL0t1R1+bamJmNLA2+SXRMrXYs6UvAa4HJkvqAjwM3AaslXQ/sBK4GiIhNklYDjwGHgSURcSTtajHZHVfjgfvSZGZmpdTochSAooY7b6T29vbo7e1tdBpmZtVR9OziJL/TJW2IiPZSnzVLp7eZmQ2maLG4/PKapuGCYWbWzMrpt/je92qXBy4YZmbNq5xiUYfuhZp1epuZWYWadMg8Fwwzs2ZRaaGo081LLhhmZo12MmcUdbzT1QXDzKxRTvbSU50fi3DBMDOrp2r1TzTgGToXDDOzWqpFB3aDHrh2wTAzq6Za3+HUwNE5XDDMzCpR71tfm2AYJxcMM7PBNMPzEE1QKPq5YJjZyNQMxWAoTVQo+rlgmNnw0OwFoKgmLBT9XDDMrPGGy5d9pZq4SOS5YJjZ0Eb6l3kttEiBGMgFw6yR/GU8MrRogRjIw5tb5Xp6si88T5VPNrxElJ6GiZY5w5A0H7gZGA3cGhE31eAgVd+lmQ0zw6gAlKslzjAkjQY+B7wZmANcI2lOlQ9S1d2ZWQsa7AxhGJ4tVKIlCgYwD9gaEf8cEc8DdwMLGpyTmTWzsWOLFQAXg8JapWBMBZ7Mrfel2DEkdUrqldS7b9++uiVnZjVQ7pf9wOn55xv9Eww7rVIwSl0vOu7PgYjoioj2iGhva2urQ1pmI9TJfpn7r/2W1Cqd3n3A9Nz6NGBXg3Ixqy1/WVqTapWC8RAwW9Is4P8BHcC7qnqECHd8V4O/7MyGrZYoGBFxWNIHgO+Q3VZ7e0RsqsGBqr5LM7PhoiUKBkBEfAv4VqPzMDMbqVql09vMzBrMBcPMzApxwTAzs0JcMMzMrBDFML0zSNI+YEeFzScDT1UxnVpqpVzB+dZSK+UKrZVvK+UKJ5fvjIgo+eTzsC0YJ0NSb0S0NzqPIlopV3C+tdRKuUJr5dtKuULt8vUlKTMzK8QFw8zMCnHBKK2r0QmUoZVyBedbS62UK7RWvq2UK9QoX/dhmJlZIT7DMDOzQlwwzMysEBeMHEnzJW2WtFXS0jof+3ZJeyU9motNkrRW0pY0Pyv32bKU52ZJV+TicyVtTJ/dImVjtksaJ+meFP+ppJknket0ST+Q9LikTZJuaNZ8JZ0qab2kX6RcP9msuQ7Ie7Skn0v6ZrPnK2l7Os7DknqbOV9JZ0r6iqQn0r/fS5s41wvS77R/2i/pww3NNyI8Zf04o4FfAecDpwC/AObU8fivAS4GHs3F/juwNC0vBf5bWp6T8hsHzEp5j06frQcuJXtL4X3Am1P8/cDn03IHcM9J5DoFuDgtnw78MuXUdPmm/b4kLY8Ffgpc0oy5Dsj7r4AvAt9s5n8LaR/bgckDYk2ZL9ANvC8tnwKc2ay5Dsh7NPBrYEYj863Ll2ErTOmX+Z3c+jJgWZ1zmMmxBWMzMCUtTwE2l8qN7D0hl6ZtnsjFrwG+kN8mLY8hewpUVcr7XuCNzZ4vMAH4GfAnzZwr2Rsl1wGv52jBaOZ8t3N8wWi6fIGJwLaBbZsx1xK5vwl4oNH5+pLUUVOBJ3PrfSnWSOdGxG6AND8nxQfLdWpaHhg/pk1EHAaeAc4+2QTTKeyryP5yb8p80+Wdh4G9wNqIaNpck88AfwO8mIs1c74BfFfSBkmdTZzv+cA+4I50ue9WSac1aa4DdQBfSssNy9cF46hS72dt1nuOB8t1qJ+h6j+fpJcAXwU+HBH7h9p0kGPXJd+IOBIRF5H95T5P0suH2LyhuUp6K7A3IjYUbTLIsev5b+GyiLgYeDOwRNJrhti2kfmOIbvsuzIiXgU8S3ZJZzDN8LtF0inAlcCXT7TpIMeuWr4uGEf1AdNz69OAXQ3Kpd8eSVMA0nxvig+Wa19aHhg/po2kMcAZwL9UmpiksWTFoicivtbs+QJExG+BHwLzmzjXy4ArJW0H7gZeL+muJs6XiNiV5nuBrwPzmjTfPqAvnWECfIWsgDRjrnlvBn4WEXvSesPydcE46iFgtqRZqaJ3AGsanNMaYFFaXkTWV9Af70h3OMwCZgPr0+npAUmXpLsgrh3Qpn9f7wS+H+nCZbnSvm8DHo+ITzdzvpLaJJ2ZlscDbwCeaMZcASJiWURMi4iZZP8Gvx8R727WfCWdJun0/mWya+2PNmO+EfFr4ElJF6TQ5cBjzZjrANdw9HLUwGPUN9+T7YwZThPwFrI7fn4FLK/zsb8E7AZeIKv615NdS1wHbEnzSbntl6c8N5PueEjxdrL/sL8C/pGjT/OfSnZKu5XsjonzTyLXV5Odtj4CPJymtzRjvsAfAT9PuT4KfCzFmy7XErm/lqOd3k2ZL1m/wC/StKn//00T53sR0Jv+PXwDOKtZc037mwA8DZyRizUsXw8NYmZmhfiSlJmZFeKCYWZmhbhgmJlZIS4YZmZWiAuGmZkV4oJhI56k5cpGsn0kjQr6J2W2f6+kl5bZZqZyIxPn4q9VGqE2F1sl6Z3l7N+sFsY0OgGzRpJ0KfBWstF3D0maTDaKadH2o4H3kt3j3uiRAQYlaUxkYwWZVcxnGDbSTQGeiohDABHxVKShLiRdngap26jsfSXjUny7pI9J+jHZU7jtQE86Oxmf3j1wfxqM7zu5YRzmKnsvx4PAkkqSlXSTpMfS2dD/SLE2SV+V9FCaLkvxT0jqkvRd4M6T+zWZuWCYfReYLumXklZI+g+QvXgJWAX8p4h4BdnZ+OJcu+ci4tURcRfZk8MLIxvg8DDwWeCdETEXuB24MbW5A/hQRFxaSaKSJgFvBy6MiD8C/j59dDPwPyPij4E/A27NNZsLLIiId1VyTLM8X5KyES0ifidpLvDvgdcB9yh72+LPgW0R8cu0aTfZWcFn0vo9g+zyAuDlwNps2B5GA7slnQGcGRH3p+3+F9mgcselNFiqwH7gOeBWSf8H6O/reAMwJx0PYGL/+E7Amoj4/SD7NCuLC4aNeBFxhGwU2x9K2kg2GNvDJ2j27CBxAZsGnkWkARCLjMPzNNn4RnmTyC6bHZY0j2zQvA7gA2QvWRpF9hKcYwpDKiCD5WlWNl+SshFN2XuTZ+dCFwE7yEa0nSnpZSn+HuD+ge2TA2SvqoVs0Le21JmOpLGSLoxsaPVnJL06bbdwkH1tAV4q6Q9T+xnAK4GHlb1/5IyI+Bbw4ZQrZJfVPpD7mS7CrAZ8hmEj3UuAz6YzgMNko3Z2RsRzkq4DvpzeE/AQ8PlB9rEK+Lyk35O9EvOdwC3pMtQYsstYm4DrgNslHSR7NeZx0p1a7yZ7K9ypZKMXvy8inkmd5/emuICPpGYfAj4n6ZF0vB8Bf1n5r8SsNI9Wa2ZmhfiSlJmZFeKCYWZmhbhgmJlZIS4YZmZWiAuGmZkV4oJhZmaFuGCYmVkh/x9eltero5evVwAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "user_activity = np.ediff1d(URM_all.tocsr().indptr)\n",
    "user_activity = np.sort(user_activity)\n",
    "\n",
    "\n",
    "pyplot.plot(user_activity, 'ro')\n",
    "pyplot.ylabel('Num Interactions ')\n",
    "pyplot.xlabel('Sorted User')\n",
    "pyplot.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### In order to evaluate our recommender we have to define:\n",
    "* A splitting of the data in URM_train and URM_test\n",
    "* An evaluation metric\n",
    "* A functon computing the evaluation for each user\n",
    "\n",
    "### The splitting of the data is very important to ensure your algorithm is evaluated in a realistic scenario by using test it has never seen. We create two splits:\n",
    "#### - Train data: we will use this to train our model\n",
    "#### - Test data: we will use this to evaluate our model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ True,  True,  True, ...,  True,  True,  True])"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_test_split = 0.80\n",
    "\n",
    "n_interactions = URM_all.nnz\n",
    "\n",
    "\n",
    "train_mask = np.random.choice([True,False], n_interactions, p=[train_test_split, 1-train_test_split])\n",
    "train_mask"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<69878x10677 sparse matrix of type '<class 'numpy.float64'>'\n",
       "\twith 7999092 stored elements in Compressed Sparse Row format>"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "URM_train = sps.csr_matrix((URM_all.data[train_mask],\n",
    "                            (URM_all.row[train_mask], URM_all.col[train_mask])))\n",
    "\n",
    "URM_train"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<69878x10669 sparse matrix of type '<class 'numpy.float64'>'\n",
       "\twith 2000962 stored elements in Compressed Sparse Row format>"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test_mask = np.logical_not(train_mask)\n",
    "\n",
    "URM_test = sps.csr_matrix((URM_all.data[test_mask],\n",
    "                            (URM_all.row[test_mask], URM_all.col[test_mask])))\n",
    "\n",
    "URM_test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Evaluation metric"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### We call items in the test set 'relevant'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([  22,   24,   38,   75,  137,  170,  175,  181,  228,  238,  373,\n",
       "        394,  406,  418, 1077, 1283, 1318, 1496])"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_id = 124\n",
    "relevant_items = URM_test[user_id].indices\n",
    "relevant_items"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Say that we have a recommendation list such as this"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 241, 1622,   15,  857, 5823])"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "recommended_items = np.array([241, 1622, 15, 857, 5823])\n",
    "recommended_items"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([False, False, False, False, False])"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "is_relevant = np.in1d(recommended_items, relevant_items, assume_unique=True)\n",
    "is_relevant"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Precision: how many of the recommended items are relevant"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [],
   "source": [
    "def precision(recommended_items, relevant_items):\n",
    "    \n",
    "    is_relevant = np.in1d(recommended_items, relevant_items, assume_unique=True)\n",
    "    \n",
    "    precision_score = np.sum(is_relevant, dtype=np.float32) / len(is_relevant)\n",
    "    \n",
    "    return precision_score"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Recall: how many of the relevant items I was able to recommend"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "def recall(recommended_items, relevant_items):\n",
    "    \n",
    "    is_relevant = np.in1d(recommended_items, relevant_items, assume_unique=True)\n",
    "    \n",
    "    recall_score = np.sum(is_relevant, dtype=np.float32) / relevant_items.shape[0]\n",
    "    \n",
    "    return recall_score"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Mean Average Precision"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [],
   "source": [
    "def MAP(recommended_items, relevant_items):\n",
    "   \n",
    "    is_relevant = np.in1d(recommended_items, relevant_items, assume_unique=True)\n",
    "    \n",
    "    # Cumulative sum: precision at 1, at 2, at 3 ...\n",
    "    p_at_k = is_relevant * np.cumsum(is_relevant, dtype=np.float32) / (1 + np.arange(is_relevant.shape[0]))\n",
    "    \n",
    "    map_score = np.sum(p_at_k) / np.min([relevant_items.shape[0], is_relevant.shape[0]])\n",
    "\n",
    "    return map_score"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Now that we have the data, we can build our first recommender. We need two things:\n",
    "* a 'fit' function to train our model\n",
    "* a 'recommend' function that uses our model to recommend\n",
    "\n",
    "### Let's start with a random recommender"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### In a random recommend we don't have anything to learn from the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [],
   "source": [
    "class RandomRecommender(object):\n",
    "\n",
    "    def fit(self, URM_train):\n",
    "           \n",
    "        self.n_items = URM_train.shape[1]\n",
    "    \n",
    "    \n",
    "    def recommend(self, user_id, at=5):\n",
    "    \n",
    "        recommended_items = np.random.choice(self.n_items, at)\n",
    "\n",
    "        return recommended_items"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[7986 3801 7338 2923 9124]\n",
      "[ 5218 10421  7432  2762  2155]\n",
      "[ 6380  4376  8399  7341 10076]\n",
      "[3477 8786 3880 9180   94]\n",
      "[ 1271  1832   471 10085 10276]\n",
      "[3703 4564 6829 5507 3698]\n",
      "[7149 1980 3795 7475 1637]\n",
      "[9023 9658  976 6547 7774]\n",
      "[2663 4822 7007 9476 7527]\n",
      "[ 6599  6326  3601  9543 10289]\n"
     ]
    }
   ],
   "source": [
    "randomRecommender = RandomRecommender()\n",
    "randomRecommender.fit(URM_train)\n",
    "\n",
    "for user_id in range(10):\n",
    "    print(randomRecommender.recommend(user_id, at=5))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Put all together in an evaluation function and let's test it!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We pass as paramether the recommender class\n",
    "\n",
    "def evaluate_algorithm(URM_test, recommender_object, at=5):\n",
    "    \n",
    "    cumulative_precision = 0.0\n",
    "    cumulative_recall = 0.0\n",
    "    cumulative_MAP = 0.0\n",
    "    \n",
    "    num_eval = 0\n",
    "\n",
    "\n",
    "    for user_id in range(URM_test.shape[0]):\n",
    "\n",
    "        relevant_items = URM_test.indices[URM_test.indptr[user_id]:URM_test.indptr[user_id+1]]\n",
    "        \n",
    "        if len(relevant_items)>0:\n",
    "            \n",
    "            recommended_items = recommender_object.recommend(user_id, at=at)\n",
    "            num_eval+=1\n",
    "\n",
    "            cumulative_precision += precision(recommended_items, relevant_items)\n",
    "            cumulative_recall += recall(recommended_items, relevant_items)\n",
    "            cumulative_MAP += MAP(recommended_items, relevant_items)\n",
    "            \n",
    "    cumulative_precision /= num_eval\n",
    "    cumulative_recall /= num_eval\n",
    "    cumulative_MAP /= num_eval\n",
    "    \n",
    "    print(\"Recommender performance is: Precision = {:.4f}, Recall = {:.4f}, MAP = {:.4f}\".format(\n",
    "        cumulative_precision, cumulative_recall, cumulative_MAP)) \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Recommender performance is: Precision = 0.0029, Recall = 0.0005, MAP = 0.0014\n"
     ]
    }
   ],
   "source": [
    "evaluate_algorithm(URM_test, randomRecommender)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### So the code works. The performance however..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Top Popular recommender\n",
    "\n",
    "#### We recommend to all users the most popular items, that is those with the highest number of interactions\n",
    "#### In this case our model is the item popularity"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
    "class TopPopRecommender(object):\n",
    "\n",
    "    def fit(self, URM_train):\n",
    "\n",
    "        item_popularity = np.ediff1d(URM_all.tocsc().indptr)\n",
    "\n",
    "        # We are not interested in sorting the popularity value,\n",
    "        # but to order the items according to it\n",
    "        self.popular_items = np.argsort(item_popularity)\n",
    "        self.popular_items = np.flip(self.popular_items, axis = 0)\n",
    "    \n",
    "    \n",
    "    def recommend(self, user_id, at=5):\n",
    "    \n",
    "        recommended_items = self.popular_items[0:at]\n",
    "\n",
    "        return recommended_items\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Now train and test our model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "topPopRecommender = TopPopRecommender()\n",
    "topPopRecommender.fit(URM_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1008    7  139   14 1293]\n",
      "[1008    7  139   14 1293]\n",
      "[1008    7  139   14 1293]\n",
      "[1008    7  139   14 1293]\n",
      "[1008    7  139   14 1293]\n",
      "[1008    7  139   14 1293]\n",
      "[1008    7  139   14 1293]\n",
      "[1008    7  139   14 1293]\n",
      "[1008    7  139   14 1293]\n",
      "[1008    7  139   14 1293]\n"
     ]
    }
   ],
   "source": [
    "for user_id in range(10):\n",
    "    print(topPopRecommender.recommend(user_id, at=5))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Recommender performance is: Precision = 0.0954, Recall = 0.0307, MAP = 0.0526\n"
     ]
    }
   ],
   "source": [
    "evaluate_algorithm(URM_test, topPopRecommender, at=5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### That's better, but we can improve\n",
    "\n",
    "### Hint, remove items already seen by the user. We can either remove them from the recommended item list or we can set them to a score so low that it will cause them to end at the very bottom of all the available items"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [],
   "source": [
    "class TopPopRecommender(object):\n",
    "\n",
    "    def fit(self, URM_train):\n",
    "        \n",
    "        self.URM_train = URM_train\n",
    "\n",
    "        item_popularity = np.ediff1d(URM_all.tocsc().indptr)\n",
    "\n",
    "        # We are not interested in sorting the popularity value,\n",
    "        # but to order the items according to it\n",
    "        self.popular_items = np.argsort(item_popularity)\n",
    "        self.popular_items = np.flip(self.popular_items, axis = 0)\n",
    "    \n",
    "    \n",
    "    def recommend(self, user_id, at=5, remove_seen=True):\n",
    "\n",
    "        if remove_seen:\n",
    "            seen_items = self.URM_train.indices[self.URM_train.indptr[user_id]:self.URM_train.indptr[user_id+1]]\n",
    "            \n",
    "            unseen_items_mask = np.in1d(self.popular_items, seen_items,\n",
    "                                        assume_unique=True, invert = True)\n",
    "\n",
    "            unseen_items = self.popular_items[unseen_items_mask]\n",
    "\n",
    "            recommended_items = unseen_items[0:at]\n",
    "\n",
    "        else:\n",
    "            recommended_items = self.popular_items[0:at]\n",
    "            \n",
    "\n",
    "        return recommended_items\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1008  139 1293   22  175]\n",
      "[1008    7  139   14 1293]\n",
      "[1008    7  139   14 1293]\n",
      "[1008    7  139   14 1293]\n",
      "[1008    7   14 1293   22]\n",
      "[1008    7  139   14 1293]\n",
      "[1008    7   14 1293   22]\n",
      "[1008    7  139   14 1293]\n",
      "[1008    7  139   14 1293]\n",
      "[  14 1293   22   19   24]\n"
     ]
    }
   ],
   "source": [
    "topPopRecommender_removeSeen = TopPopRecommender()\n",
    "topPopRecommender_removeSeen.fit(URM_train)\n",
    "\n",
    "for user_id in range(10):\n",
    "    print(topPopRecommender_removeSeen.recommend(user_id, at=5))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Recommender performance is: Precision = 0.1977, Recall = 0.0530, MAP = 0.1468\n"
     ]
    }
   ],
   "source": [
    "evaluate_algorithm(URM_test, topPopRecommender_removeSeen)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Simple but effective. Always remove seen items if your purpose is to recommend \"new\" ones"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Global effects recommender\n",
    "\n",
    "#### We recommend to all users the highest rated items\n",
    "\n",
    "#### First we compute the average of all ratings, or global average"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The global average is 3.51\n"
     ]
    }
   ],
   "source": [
    "globalAverage = np.mean(URM_train.data)\n",
    "\n",
    "print(\"The global average is {:.2f}\".format(globalAverage))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### We subtract the bias to all ratings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1.48788181 1.48788181 1.48788181 1.48788181 1.48788181 1.48788181\n",
      " 1.48788181 1.48788181 1.48788181 1.48788181]\n"
     ]
    }
   ],
   "source": [
    "URM_train_unbiased = URM_train.copy()\n",
    "\n",
    "URM_train_unbiased.data -= globalAverage\n",
    "\n",
    "print(URM_train_unbiased.data[0:10])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Then we compute the average rating for each item, or itemBias"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "matrix([[-1.74643806e-02, -6.55957996e-02, -1.18630771e-01, ...,\n",
       "         -2.16394028e-05,  6.98190864e-06,  6.98190864e-06]])"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "item_mean_rating = URM_train_unbiased.mean(axis=0)\n",
    "item_mean_rating"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYoAAAEGCAYAAAB7DNKzAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAXGUlEQVR4nO3dfbBc9X3f8fdXF8lw8QMPkh1ZQroQNHjkh2K4xSbx1I0xDiYZy2kyNc7FBuNWNTYOqWea4qpxmng047RpprgBy7c2mJhbQ+y6jkxJsU0MmYwfokvLs8Eodi4okHDBxSbIRkj69o9zZK0ue492V7t7dve+XzM7u+d7zp79/vRwP/fsOfvbyEwkSVrMsrobkCQNNoNCklTJoJAkVTIoJEmVDApJUqWj6m6gF1auXJkTExN1tyFJQ+OOO+54IjNXNVs3kkExMTHB7Oxs3W1I0tCIiLnF1vnWkySpkkEhSapkUEiSKhkUkqRKBoUkqZJBIUnDbmYGJiZg2bLifmamq7sfyctjJWnJmJmBzZth9+5ieW6uWAaYmurKS3hEIUnDbMuWgyFxwO7dRb1LDApJGmYPP9xevQMGhSQNs3Xr2qt3wKCQpGG2dSuMjx9aGx8v6l1iUEjSMJuagulpWL8eIor76emuncgGr3qSpOE3NdXVYFjIIwpJUiWDQpJUyaCQJFUyKCRJlQwKSVIlg0KSVMmgkCRVMigkSZVqDYqIOC8iHoyInRFxRZP1myLi7oi4MyJmI+INdfQpSUtZbZ/Mjogx4CrgXGAXsCMitmfm/Q2b3Qpsz8yMiNcAfwK8ov/dStLSVecRxVnAzsz8XmbuAW4ANjVukJn/kJlZLh4LJJKkvqozKNYAjzQs7yprh4iIX4mIB4D/BVyy2M4iYnP59tTs/Px815uVpKWqzqCIJrXnHTFk5v/MzFcAbwc+utjOMnM6Myczc3LVqlVdbFOSlrY6g2IXcFLD8lrg0cU2zsy/AH42Ilb2ujFJ0kF1BsUOYENEnBwRK4ALgO2NG0TEqRER5eMzgBXAk33vVJKWsNquesrMvRFxGXALMAZck5n3RcT7yvXbgF8F3h0RzwE/Bt7RcHJbktQHMYo/dycnJ3N2drbuNiRpaETEHZk52Wydn8yWJFUyKCRJlQwKSVIlg0KSVMmgkKRhNzMDExOwbFlxPzPT1d3XdnmsJKkLZmZg82bYvbtYnpsrlgGmprryEh5RSNIw27LlYEgcsHt3Ue8Sg0KShtnDD7dX74BBIUnDbN269uodMCgkaZht3QorVhxaW7GiqHeJQSFJw27hVExdnprJoJCkYbZlCzz33KG1557zZLYkqTQ31169AwaFJA2zsbH26h0wKCRpmO3b1169AwaFJA2z9evbq3fAoJCkYbZ1K4yPH1obH/fyWElSaWoKpqeLI4iI4n56umvzPIGTAkrS8Jua6mowLOQRhSQNO6cZlyQtamYGLrkE9uwplufmimVwmnFJEnD55QdD4oA9e4p6lxgUkjTMnnyyvXoHDApJUiWDQpKG2YkntlfvgEEhScPsyith+fJDa8uXF/UuMSgkaZhNTcG11x76gbtrr/UDd5KkBn7gTpJUJ4NCklTJoJAkVTIoJGnYOdeTJGlRMzOweTPs3l0sz80VyzAacz1FxHkR8WBE7IyIK5qsn4qIu8vbNyLiH9XRpyQNrC1bDobEAbt3F/UuqS0oImIMuAp4K7AReGdEbFyw2feBN2bma4CPAtP97VKSBtzDD7dX70CdRxRnATsz83uZuQe4AdjUuEFmfiMz/1+5+C1gbZ97lKTBtm5de/UO1BkUa4BHGpZ3lbXFvBf4s8VWRsTmiJiNiNn5+fkutShJA+7889urd6DOoIgmtWy6YcQvUATFv11sZ5k5nZmTmTm5atWqLrUoSQPu5pvbq3egzquedgEnNSyvBR5duFFEvAb4FPDWzOzeBOuSNArm5tqrd6DOI4odwIaIODkiVgAXANsbN4iIdcAXgXdl5ndr6FGSBtuyRX6ML1bvQG1HFJm5NyIuA24BxoBrMvO+iHhfuX4b8BHgRODqiADYm5mTdfUsSQNn//726h2IzKanBYba5ORkzs7O1t2GJPVeNDvdW2rj53tE3LHYL+JO4SFJw2yxoKgKkDYZFJI0zBY7aujiu0UGhSQNs7Gx9uodMCgkaZjt29devQMGhSQNs/Xr26t3wKCQpGF26qnt1TtgUEjSMLvttvbqHTAoJGmYeY5CklQ3g0KSVMmgkKRh5lVPkqRKW7fC+PihtfHxot4lBoUkDbOpKZieLo4gIor76emi3iVtTTMeEccDJ2Xm3V3rQJJ0ZKamuhoMCx32iCIibouIF0fECcBdwLUR8Yc960iSNFBaeevpJZn5I+CfAddm5pnAm3vbliRpULQSFEdFxGrgnwM39bgfSVK7ZmZgYqL4+tOJiWK5i1o5R/F7FF9X+peZuSMiTgEe6moXkqTOzMzAJZfAnj3F8txcsQxdO2/hV6FK0jBbuRKefPL59RNPhCeeaHk3VV+Fetgjiog4Gngv8Erg6AP1zLyk5Q4kSb3RLCSq6h1o5RzFZ4GfAX4RuB1YCzzdtQ4kSQOtlaA4NTN/G3gmM68Dfgl4dW/bkiQNilaC4rny/qmIeBXwEmCiZx1JkgZKK1c9TZefyP5tYDvwQuAjPe1KknR4Xb4MdjGHDYrM/FT58HbglN62I0lq2eWX9+VlFg2KiLgwM6+PiA81W5+ZTuMhSXXq4pVNVaqOKI4t71/Uj0YkSV20YkXXdrVoUGTmJ8v73+3aq0mS+uOaa7q2q0WveoqIoyPiooh4WxR+KyJuiogrI2Jl1zqQJHVfF6cdr7o89o+BtwCXALcB64E/oviw3We61oEkaaBVnaPYmJmvioijgF2Z+cay/r8j4q4+9CZJGgBVRxR7ADJzL/DognX7etaRJGmgVB1RrI2IjwPR8JhyeU3PO5MkVTv2WHjmmeb1LqoKin/T8HjhnN1dmcM7Is4DrgTGgE9l5scWrH8FcC1wBrAlM/+gG68rSSNh79726h2qujz2uq6+0gIRMQZcBZwL7AJ2RMT2zLy/YbMfAL8BvL2XvUjSUHr22fbqHWplUsBeOQvYmZnfy8w9wA3ApsYNMvPxzNzBwYkJJUl9VmdQrAEeaVjexRGc+4iIzRExGxGz8/PzR9ycJKlQZ1BEk1rH38uamdOZOZmZk6tWrTqCtiRJjVr5KtSTgQ9SfAfFT7fPzLcd4WvvAk5qWF7L8y/DlSTVrJXvo/gS8Gngy8D+Lr72DmBDGUR/C1wA/HoX9y9J6oJWguInmfnxw2/WnszcGxGXAbdQXB57TWbeFxHvK9dvi4ifobgU98XA/oj4TYpPjP+o2/1IkpprJSiujIjfAb4C/PSaq8z8P0f64pl5M3Dzgtq2hsd/R/GWlCSpJq0ExauBdwFv4uBbT1kuS5LqEgHZ5BqgaHatUOdaCYpfAU4pP+sgSRoUzUKiqt6hVi6PvQs4rquvKkkaGq0cUbwMeCAidnDoOYojvTxWkjQEWgmK3+l5F5KkgXXYoMjM2yNiPbAhM78WEeMUl7NKkury/vf37aUOe44iIv4l8AXgk2VpDcWH8CRJdfnEJ/r2Uq2czP4A8PPAjwAy8yHgpb1sSpI0OFoJimcbL40tv0O7u9deSZIGVitBcXtE/DvgmIg4F/g8xbxPkqRBdM45Xd1dK0FxBTAP3AP8K+DmzNzS1S4kSd3zta91dXetXB77wcy8EvhvBwoRcXlZkySNuFaOKC5qUru4y31Iklo1M9PXl1v0iCIi3knx/RAnR8T2hlUvAp7sdWOSpEVcfHFfX67qradvAI8BK4H/3FB/Gri7l01Jkirs3dvXl1s0KDJzDpgDzu5fO5KkQVP11tPTNP+8RACZmS/uWVeSpM5cemnXd1l1RPGirr+aJOnIHO5E9tVXd/0lW7nqSZI0KC68sO8vaVBIkioZFJKkSgaFJA2L44+vXr+sNz/SDQpJGhZPPVW9ft++nrysQSFJw6CP32i3kEEhScOgj99ot5BBIUmDLuLw2/Tgg3YHGBSSNMhaCQnoyQftDjAoJGlQtRoSPdbKFxdJkvqp3YDIZtPydY9BIUmDYECOHpoxKCSpLt0Ihx4fTYBBIUn90Ysjhj6EBBgUknRk6nrLqE8hATVf9RQR50XEgxGxMyKuaLI+IuLj5fq7I+KMOvpUl83MFP+5vHkbhVsd+hgSUOMRRUSMAVcB5wK7gB0RsT0z72/Y7K3AhvL2OuAT5b2gvn+kkuqxfDns2dP3l63ziOIsYGdmfi8z9wA3AJsWbLMJ+OMsfAs4LiJW97vRnhnG32Qk1SOzlpCAeoNiDfBIw/KustbuNgBExOaImI2I2fn5+a422rHxcX/QS+pc5sFbjeoMimY/KRf+abSyTVHMnM7MycycXLVq1RE317YVK54fBD/+cf/7kDTcBiQcGtV51dMu4KSG5bXAox1sUw+PCCQdqQEKgyp1HlHsADZExMkRsQK4ANi+YJvtwLvLq59eD/wwMx/rd6OH8G0jSVUajwgOdxsStR1RZObeiLgMuAUYA67JzPsi4n3l+m3AzcD5wE5gN/Ceuvo1HHpsiP7TSEtNrR+4y8ybKcKgsbat4XECH+h3X4dYswYeHYx3uw7rmGNg9+66u5A0YvxkdpU6jiL8zVrSgDEoFtOrkDAIJA0Zg6KZV76yO/sxFCSNAIOimfvvP/w2zRgMkkaQX4W60NhYe9ufc87QXeomSe3wiGKh/ftb39ZwkLQEGBSdMCAkLSG+9dSolSudDAlJS4xB0Y5l/nFJWnr8ydeOffvq7kCS+s6gOKBbn52QpBFjUBxwuM9OXH99f/qQpAFjULRqaqruDiSpFgaFJKmSQdGKjRvr7kCSamNQtOK+++ruQJJqY1BIkioZFJKkSgaFJKmSQSFJqmRQSJIqGRQAxx9fdweSNLAMCoCnnqq7A0kaWAaFJKmSQXE4l15adweSVCuD4nCuvrruDiSpVgaFJKmSQSFJqmRQSJIqGRSSpEoGhSSpkkEhSapkUEiSKtUSFBFxQkR8NSIeKu+bTrYUEddExOMRcW9PG1qxor26JC0hdR1RXAHcmpkbgFvL5WY+A5zX82727GmvLklLSF1BsQm4rnx8HfD2Zhtl5l8AP+hXU5Kk56srKF6WmY8BlPcvPdIdRsTmiJiNiNn5+fn2nnziie3VJWkJ6VlQRMTXIuLeJrdNvXi9zJzOzMnMnFy1alV7T77ySli+/NDa8uVFXZKWuKN6tePMfPNi6yLi7yNidWY+FhGrgcd71UdLpqaK+y1b4OGHYd062Lr1YF2SlrCeBcVhbAcuAj5W3v9pTX0cNDVlMEhSE3Wdo/gYcG5EPAScWy4TES+PiJsPbBQRnwO+CZwWEbsi4r21dCtJS1gtQZGZT2bmOZm5obz/QVl/NDPPb9junZm5OjOXZ+bazPx0z5qamYGJCVi2rLifmenZS0nSMKnrrafBMjMDl1xy8HMTc3PFMvh2lKQlzyk8AC6//Pkfrtuzp6hL0hJnUAA8+WR7dUlaQgwKSVIlg0KSVMmgAIhory5JS4hBAZDZXl2SlhCDAmBsrL26JC0hBgXAvn3t1SVpCTEoANavb68uSUuIQQHFTLHj44fWxseLuiQtcQYFFNN0TE8XRxARxf30tNN3SBLO9XSQ04xLUlMeUUiSKhkUkqRKBoUkqZJBIUmqZFBIkipFjuB8RhExD8x1+PSVwBNdbGcQOcbR4BhHw6CMcX1mrmq2YiSD4khExGxmTtbdRy85xtHgGEfDMIzRt54kSZUMCklSJYPi+abrbqAPHONocIyjYeDH6DkKSVIljygkSZUMCklSJYOiFBHnRcSDEbEzIq6ou592RMRJEfH1iPhORNwXEZeX9RMi4qsR8VB5f3zDcz5cjvXBiPjFhvqZEXFPue7jERF1jGkxETEWEf83Im4ql0dqjBFxXER8ISIeKP8+zx7BMf7r8t/pvRHxuYg4ehTGGBHXRMTjEXFvQ61r44qIF0TEjWX92xEx0bfBZeaSvwFjwF8DpwArgLuAjXX31Ub/q4EzyscvAr4LbAT+I3BFWb8C+P3y8cZyjC8ATi7HPlau+yvgbCCAPwPeWvf4Foz1Q8B/B24ql0dqjMB1wL8oH68AjhulMQJrgO8Dx5TLfwJcPApjBP4JcAZwb0Ota+MC3g9sKx9fANzYt7HV/Q9nEG7lX8otDcsfBj5cd19HMJ4/Bc4FHgRWl7XVwIPNxgfcUv4ZrAYeaKi/E/hk3eNp6GctcCvwpoagGJkxAi8uf4jGgvoojXEN8AhwAsX34dwEvGVUxghMLAiKro3rwDbl46MoPs0dvRpL4823ngoH/vEesKusDZ3ycPS1wLeBl2XmYwDl/UvLzRYb75ry8cL6oPgvwG8B+xtqozTGU4B54Nry7bVPRcSxjNAYM/NvgT8AHgYeA36YmV9hhMa4QDfH9dPnZOZe4IfAiT3rvIFBUWj23ubQXTccES8E/gfwm5n5o6pNm9Syol67iPhl4PHMvKPVpzSpDfQYKX5LPAP4RGa+FniG4u2KxQzdGMv36DdRvN3ycuDYiLiw6ilNagM9xhZ1Mq7axmxQFHYBJzUsrwUeramXjkTEcoqQmMnML5blv4+I1eX61cDjZX2x8e4qHy+sD4KfB94WEX8D3AC8KSKuZ7TGuAvYlZnfLpe/QBEcozTGNwPfz8z5zHwO+CLwc4zWGBt1c1w/fU5EHAW8BPhBzzpvYFAUdgAbIuLkiFhBcaJoe809tay8KuLTwHcy8w8bVm0HLiofX0Rx7uJA/YLyKoqTgQ3AX5WHxk9HxOvLfb674Tm1yswPZ+bazJyg+Pv588y8kNEa498Bj0TEaWXpHOB+RmiMFG85vT4ixsvezgG+w2iNsVE3x9W4r1+j+D/Qn6Oouk/+DMoNOJ/iaqG/BrbU3U+bvb+B4hD0buDO8nY+xfuXtwIPlfcnNDxnSznWB2m4WgSYBO4t1/0RfTpZ1uZ4/ykHT2aP1BiB04HZ8u/yS8DxIzjG3wUeKPv7LMWVP0M/RuBzFOddnqP47f+93RwXcDTweWAnxZVRp/RrbE7hIUmq5FtPkqRKBoUkqZJBIUmqZFBIkioZFJKkSgaFlrSI2FLOZHp3RNwZEa9r8/kXR8TL23zOROMMo83qEXF6RJzfzn6lXjmq7gakukTE2cAvU8y8+2xErKSYsbXV549RzHx6L93/VPDpFNfT39zl/Upt84hCS9lq4InMfBYgM5/IzEcBIuKccmK+e8rvGXhBWf+biPhIRPwlxcyek8BMeTRyTPldArdHxB0RcUvD9A1nRsRdEfFN4ANVTZWzA/we8I5yv++IiGPLPnaUfW0qt704Ir4UEV+OiO9HxGUR8aFym29FxAk9+rPTEmJQaCn7CnBSRHw3Iq6OiDcCRMTRwGeAd2TmqymOvC9teN5PMvMNmXk9xaeopzLzdGAv8F+BX8vMM4FrgK3lc64FfiMzzz5cU5m5B/gIxfcNnJ6ZN1J8ivfPM/MfA78A/KdyZlmAVwG/DpxVvt7uLCYV/CbFFBDSETEotGRl5j8AZwKbKab3vjEiLgZOo5i47rvlptdRfCnNATcussvTKH5ofzUi7gT+PbA2Il4CHJeZt5fbfbaDdt8CXFHu9zaK6RzWleu+nplPZ+Y8xdTTXy7r91B8P4J0RDxHoSUtM/dR/OC9LSLuoZh07c7DPO2ZReoB3LfwqCEijuPIp4MO4Fcz88EF+34d8GxDaX/D8n78P64u8IhCS1ZEnBYRGxpKpwNzFBPWTUTEqWX9XcDtC59fepri62ehmNxtVXmSnIhYHhGvzMyngB9GxBvK7aZaaK9xv1B8u9kHyxlFiYjXtrAPqSsMCi1lLwSui4j7I+Juiu8x/g+Z+RPgPcDny6OM/cC2RfbxGWBb+ZbQGMX0z78fEXdRHJn8XLnde4CrypPZP26ht68DGw+czAY+CiwH7i4vof1o+8OVOuPssZKkSh5RSJIqGRSSpEoGhSSpkkEhSapkUEiSKhkUkqRKBoUkqdL/B4gZU+OsIZWXAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "item_mean_rating = np.array(item_mean_rating).squeeze()\n",
    "item_mean_rating = np.sort(item_mean_rating[item_mean_rating!=0])\n",
    "\n",
    "pyplot.plot(item_mean_rating, 'ro')\n",
    "pyplot.ylabel('Item Bias')\n",
    "pyplot.xlabel('Sorted Item')\n",
    "pyplot.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### And the average rating for each user, or userBias"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "matrix([[ 0.00209031],\n",
       "        [-0.00044076],\n",
       "        [ 0.00077   ],\n",
       "        ...,\n",
       "        [ 0.00333601],\n",
       "        [ 0.0013138 ],\n",
       "        [-0.00117385]])"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_mean_rating = URM_train_unbiased.mean(axis=1)\n",
    "user_mean_rating"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAEGCAYAAACUzrmNAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAW00lEQVR4nO3de7AmdZ3f8feHuchFuYyMOFwHK1PuskZRzyIWZuMCs0FCxNRaWdxR0dWa0miJ2SQWZKq21kqsYjcp4zWyUwqO4awaVzdMKRsEFC2zXjgoCgg4uDI6xciMEEEd5frNH91nORzOc/pc5rmcOe9X1VPd/evu5/c9p848n+lf99OdqkKSpNkcNOwCJEmjz7CQJHUyLCRJnQwLSVInw0KS1GnlsAvoh6OPPrrWr18/7DIkacm46aabflZVa3utPyDDYv369UxMTAy7DElaMpLsnG29w1CSpE6GhSSpk2EhSepkWEiSOhkWkqROhoUkLXXj47B+PRx0UDMdH9/vXRyQl85K0rIxPg6bN8O+fc3yzp3NMsCmTfutG48sJGkp27LliaCYtG9f074fGRaStJTt7PFdul7tCzTUsEhyTpI7k9yV5OIZ1v9Wkq8neSjJfxhGjZI00g7q8THeq32BhnbOIskK4MPARmAXcGOS7VX1/Smb3Q+8A3jVEEqUpNH3+OPza1+gYR5ZnAbcVVX/UFUPA58Czp+6QVXtqaobgUeGUaAkqTHMsDgO+MmU5V1t24Ik2ZxkIsnE3r17F12cJC0Jhx02v/YFGmZYZIa2WuibVdXWqhqrqrG1a3veZVeSDiwHHzy/9gUaZljsAk6Ysnw8cM+QapGkpen+++fXvkDDDIsbgQ1JTk6yGrgA2D7EeiRp6VmzZn7tCzS0q6Gq6tEkbweuAVYAl1fVbUne0q6/LMmzgQngcODxJO8ETqmqB4dVtyQtR0O93UdVXQ1cPa3tsinzP6UZnpIkzWQZDENJkhZrGVwNJUlarF/+cn7tC2RYSJI6GRaSpE6GhSSpk2EhSepkWEiSOhkWkqROhoUkqZNhIUnqZFhIkjoZFpKkToaFJKmTYSFJ6mRYSJI6GRaSpE6GhSSpk2EhSepkWEjSUrV69cC6Miwkaal65JGBdWVYSJI6GRaStBQlA+3OsJCkpWbAQQGwcuA9SpIWZj4hUbVfuzYsJGlUDeEIohfDQpKGYYSCYC4MC0maq6XyAb+fh6DAsJC0EEvlQ3M56kNQwJDDIsk5wPuBFcBHq+rSaevTrj8X2Ae8oaq+PfBCDyT+I5cOXH0KChhiWCRZAXwY2AjsAm5Msr2qvj9ls1cAG9rXS4CPtNOlzw9tSftTH4MChvs9i9OAu6rqH6rqYeBTwPnTtjkf+EQ1vgEcmWTdoAudl2RuL0naH6r6HhQw3LA4DvjJlOVdbdt8twEgyeYkE0km9u7du18L7ckQkDRIb33rE+EwoJCYNMxzFjN9sk7/yeeyTdNYtRXYCjA2Nta/36CBIKlfBvjhP1/DDItdwAlTlo8H7lnANoNhSEiayQh/wO9PwxyGuhHYkOTkJKuBC4Dt07bZDrw+jdOBB6pq96ALNSikDqec8tThkeXyWiaGdmRRVY8meTtwDc2ls5dX1W1J3tKuvwy4muay2btoLp1940CLXL16oPeLH5pl9AcvaWGG+j2LqrqaJhCmtl02Zb6Atw26LmA4QeGHtqQR5Te4e9mfQWEISFriDIuZLOYchcEg6QBkWEy3kKAwICQd4AyLxTAkJC0TPlZ1Ia680qCQtKwYFlPNdQhq06b+1iFJI8awmC+PKCQtQ4bFfBgUkpYpw2LS6tXDrkCSRpZhMWk53NZDkhbIsJgrh6AkLWOGhSSpk2EhSepkWMyFQ1CSljnDQpLUybCQJHUyLCRJnQwLgBUrhl2BJI00wwLg8ceHXYEkjTTDQpLUybCQJHUyLLqcddawK5CkoTMsulx33bArkKShMywkSZ0MC0lSJ8NCktTJsJAkdRpKWCRZk+TaJDva6VE9trs8yZ4ktw66RknSE4Z1ZHExcH1VbQCub5dn8nHgnEEVJUma2bDC4nxgWzu/DXjVTBtV1VeB+wdVlCRpZp1hkeSMJIe1869N8t4kJy2y32OqajdAO33WIt9PktRHczmy+AiwL8kLgHcBO4FPdO2U5Lokt87wOn+RNffqb3OSiSQTe/fu7UcXkrRsrZzDNo9WVbUf8u+vqo8lubBrp6o6u9e6JPcmWVdVu5OsA/bMo+Ze/W0FtgKMjY35HFRJ2o/mcmTxiySXAK8FvpBkBbBqkf1uByYD50LgqkW+nySpj+YSFn8EPAS8qap+ChwH/NdF9nspsDHJDmBju0ySY5NcPblRkk8CXweem2RXkjctsl9J0gKk6sAbsRkbG6uJiYm575D0XncA/n4kabokN1XVWK/1c7ka6vQkNyb5ZZKHkzyW5IH9W6YkaZTNZRjqQ8BrgB3AIcCbgQ/3syhJ0miZy9VQVNVdSVZU1WPAFUn+vs91SZJGyFzCYl+S1cDNSf4S2A0c1t+yJEmjZC7DUK8DVgBvB34FnAD8YT+LkiSNls4ji6ra2c7+Gnh3f8uRJI2inmGR5H9V1b9JcgvwlOtHq+r5fa1MkjQyZjuyuKidnjeIQiRJo6tnWEy5K+zkMBRJjgbuqwPxm3ySpJ56nuBuv4x3Q5LPJXlh+7S6W4F7k/hAIklaRmYbhvoQ8J+AI4AvAa+oqm8k+S3gk8D/GUB9kqQRMNulsyur6otV9Rngp1X1DYCqumMwpUmSRsVsYfH4lPlfT1vnOQtJWkZmG4Z6QZIHgQCHtPO0ywf3vTJJ0siY7WqoFYMsRJI0uuZyuw9J0jJnWEiSOhkWkqROs4ZFkhVJrhtUMUPztKfNr12SlplZw6J92NG+JEcMqJ7heOih+bVL0jIzl4cf/Qa4Jcm1NM+zAKCq3tG3qiRJI2UuYfGF9iVJWqbm8vCjbUkOAU6sqjsHUJMkacR0Xg2V5F8BN9PeODDJqUm297swSdLomMuls38OnAb8HKCqbgZO7mNNg/fMZ86vXZKWmbmExaNV9cC0tgPrRoLvfz+sWvXktlWrmnZJ0pzC4tYkfwysSLIhyQeBv+9zXYO1aRNccQWcdBIkzfSKK5p2SRLpekJqkkOBLcAf0Nxx9hrgP1fVb/pf3sKMjY3VxMTEsMuQpCUjyU1VNdZrfeeRRVXtq6otVfW7wEuAv1hsUCRZk+TaJDva6VEzbHNCki8nuT3JbUkuWkyfncbHYf16OOigZjo+3tfuJGkpmcvVUH+d5PAkhwG3AXcm+Y+L7Pdi4Pqq2gBc3y5P9yjw76vqt4HTgbclOWWR/c5sfBw2b4adO6GqmW7ebGBIUmsu5yxOqaoHgVcBVwMnAq9bZL/nA9va+W3tez9JVe2uqm+3878AbgeOW2S/M9uyBfbte3Lbvn1NuyRpTmGxKskqmg/0q6rqERZ/NdQxVbUbmlAAnjXbxknWAy8EvjnLNpuTTCSZ2Lt37/yq+fGP59cuScvMXMLir4C7gcOAryY5CXhw1j2AJNcluXWG1/nzKTDJ04HPAu9sj3BmVFVbq2qsqsbWrl07ny7gxBPn1y5Jy8xcbvfxAeADk8tJfgz8/hz2O7vXuiT3JllXVbuTrAP29NhuFU1QjFfV57r6XLD3vKc5RzF1KOrQQ5t2SVLvsEjyp9OaCvgZ8LWq+tEi+90OXAhc2k6vmqH/AB8Dbq+q9y6yv9lNfp9iy5Zm6OnEE5ug8HsWkgTMPgz1jGmvw4Ex4O+SXLDIfi8FNibZAWxsl0lybJKr223OoDmRfmaSm9vXuYvst7dNm+Duu+Hxx5upQSFJ/6jnkUVVvXum9iRrgOuATy2006q6DzhrhvZ7gHPb+a/RfAlwMMbHPbKQpB7m8jyLJ6mq+9shogPH+Dj8yZ/Aww83yzt3NstgYEgSc7sa6kmSnAn8vz7UMjwXXfREUEx6+OGmXZI06wnuW3jq9ynWAPcAr+9nUQN3333za5ekZWa2Yajzpi0XcF9V/WqmjSVJB67ZTnDvHGQhQ/XMZ858FOHDjyQJWMA5iwOSDz+SpFkZFuDDjySpw7wvnT1gbdpkOEhSDx5ZSJI6GRaSpE6GhSSpk2EhSepkWEiSOhkWkqROhoUkqZNhIUnqZFhIkjoZFpKkToaFJKmTYSFJ6mRYSJI6GRaSpE6GhSSpk2EhSepkWEiSOhkWkqROhoUkqZNhIUnqNJSwSLImybVJdrTTo2bY5uAk30ry3SS3JXn3MGqVJA3vyOJi4Pqq2gBc3y5P9xBwZlW9ADgVOCfJ6QOsUZLUGlZYnA9sa+e3Aa+avkE1ftkurmpfNZjyJElTDSssjqmq3QDt9FkzbZRkRZKbgT3AtVX1zV5vmGRzkokkE3v37u1L0ZK0XK3s1xsnuQ549gyrtsz1ParqMeDUJEcCf5vkeVV1a49ttwJbAcbGxjwCkaT9qG9hUVVn91qX5N4k66pqd5J1NEcOs73Xz5PcAJwDzBgWkqT+GdYw1Hbgwnb+QuCq6RskWdseUZDkEOBs4I6BVShJ+kfDCotLgY1JdgAb22WSHJvk6nabdcCXk3wPuJHmnMXnh1KtJC1zfRuGmk1V3QecNUP7PcC57fz3gBcOuDRJ0gz8BrckqZNhIUnqZFhIkjoZFpKkToaFJKmTYSFJ6mRYSJI6GRaSpE6GhSSpk2EhSepkWEiSOhkWkqROhoUkqZNhIUnqZFhIkjoZFpKkToaFJKmTYSFJ6mRYSJI6GRaSpE6GhSSpk2EhSepkWEiSOhkWkqROhoUkqZNhIUnqZFhIkjoNJSySrElybZId7fSoWbZdkeQ7ST4/yBolSU8Y1pHFxcD1VbUBuL5d7uUi4PaBVCVJmtGwwuJ8YFs7vw141UwbJTke+JfARwdUlyRpBsMKi2OqajdAO31Wj+3eB7wLeHxQhUmSnmplv944yXXAs2dYtWWO+58H7Kmqm5K8fA7bbwY2A5x44onzqFSS1KVvYVFVZ/dal+TeJOuqaneSdcCeGTY7A3hlknOBg4HDk1xZVa/t0d9WYCvA2NhYLf4nkCRNGtYw1Hbgwnb+QuCq6RtU1SVVdXxVrQcuAL7UKygkSf01rLC4FNiYZAewsV0mybFJrh5STZKkHvo2DDWbqroPOGuG9nuAc2dovwG4oe+FSZJm5De4JUmdDAtJUifDYtL4OKxfDwcd1EzHx4ddkSSNjKGcsxg54+OweTPs29cs79zZLANs2jS8uiRpRHhkAbBlyxNBMWnfvqZdkmRYAPDjH8+vXZKWGcMCoNftQbxtiCQBhkXjPe+BQw99ctuhhzbtkiTDAmhOYm/dCiedBEkz3brVk9uS1PJqqEmbNhkOktSDRxaSpE6GhSSpk2EhSepkWEiSOhkWkqROqTrwnkCaZC+wc4G7Hw38bD+W009LqVZYWvUupVrBevtpKdUKC6/3pKpa22vlARkWi5FkoqrGhl3HXCylWmFp1buUagXr7aelVCv0r16HoSRJnQwLSVInw+Kptg67gHlYSrXC0qp3KdUK1ttPS6lW6FO9nrOQJHXyyEKS1MmwkCR1MixaSc5JcmeSu5JcPOC+L0+yJ8mtU9rWJLk2yY52etSUdZe0dd6Z5F9MaX9xklvadR9Ikrb9aUk+3bZ/M8n6RdR6QpIvJ7k9yW1JLhrVepMcnORbSb7b1vruUa11Wt0rknwnyedHud4kd7d93JxkYpRrbd/vyCR/k+SO9u/3paNab5Lntr/XydeDSd451Hqratm/gBXAD4HnAKuB7wKnDLD/3wNeBNw6pe0vgYvb+YuBv2jnT2nrexpwclv3inbdt4CXAgH+DnhF2/5vgcva+QuATy+i1nXAi9r5ZwA/aGsauXrb9316O78K+CZw+ijWOq3uPwX+Gvj8iP8t3A0cPa1tJGtt32Mb8OZ2fjVw5CjXO6XuFcBPgZOGWe9APgxH/dX+Iq+ZsnwJcMmAa1jPk8PiTmBdO78OuHOm2oBr2vrXAXdMaX8N8FdTt2nnV9J8uzP7qe6rgI2jXi9wKPBt4CWjXCtwPHA9cCZPhMVI1svMYTGqtR4O/Gj6/qNa77Qa/wD4v8Ou12GoxnHAT6Ys72rbhumYqtoN0E6f1bb3qvW4dn56+5P2qapHgQeAZy62wPaw9YU0/2MfyXrbIZ2bgT3AtVU1srW23ge8C3h8Stuo1lvAF5PclGTziNf6HGAvcEU7xPfRJIeNcL1TXQB8sp0fWr2GRSMztI3qNcW9ap3tZ9jvP1+SpwOfBd5ZVQ/OtmmPvgdSb1U9VlWn0vyP/bQkz5tl86HWmuQ8YE9V3TTXXXr0Pai/hTOq6kXAK4C3Jfm9WbYddq0raYZ6P1JVLwR+RTOM08uw623eMFkNvBL4TNemPfreb/UaFo1dwAlTlo8H7hlSLZPuTbIOoJ3uadt71bqrnZ/e/qR9kqwEjgDuX2hhSVbRBMV4VX1u1OsFqKqfAzcA54xwrWcAr0xyN/Ap4MwkV45qvVV1TzvdA/wtcNqo1tq+1672yBLgb2jCY1TrnfQK4NtVdW+7PLR6DYvGjcCGJCe3SX4BsH3INW0HLmznL6Q5NzDZfkF7JcPJwAbgW+0h6S+SnN5e7fD6aftMvtergS9VO1A5X+17fwy4vareO8r1Jlmb5Mh2/hDgbOCOUawVoKouqarjq2o9zd/gl6rqtaNYb5LDkjxjcp5mXP3WUawVoKp+CvwkyXPbprOA749qvVO8hieGoKb3Mdh6F3vy5UB5AefSXNnzQ2DLgPv+JLAbeIQm7d9EM3Z4PbCjna6Zsv2Wts47aa9saNvHaP7B/hD4EE98Q/9gmsPYu2iujHjOImp9Gc2h6veAm9vXuaNYL/B84DttrbcCf9a2j1ytM9T+cp44wT1y9dKcA/hu+7pt8t/MKNY6pZ9TgYn27+F/A0eNeL2HAvcBR0xpG1q93u5DktTJYShJUifDQpLUybCQJHUyLCRJnQwLSVInw0LLWpItae5I+7327p4vmef+b0hy7Dz3WZ8pdxie0v7ytHeandL28SSvns/7S/2wctgFSMOS5KXAeTR30X0oydE0dyOd6/4rgDfQXMM+7G/895RkZTX3/pEWzCMLLWfrgJ9V1UMAVfWzam9hkeSs9oZzt6R53sjT2va7k/xZkq/RfLt2DBhvj0oOaZ8d8JX25nrXTLk1w4vTPFfj68DbFlJskkuTfL89CvpvbdvaJJ9NcmP7OqNt//MkW5N8EfjE4n5NkmGh5e2LwAlJfpDkfyT559A8NAn4OPBHVfVPaY7A3zplv99U1cuq6kqabwRvquZmhY8CHwReXVUvBi4H3tPucwXwjqp66UIKTbIG+NfA71TV84H/0q56P/Dfq+p3gT8EPjpltxcD51fVHy+kT2kqh6G0bFXVL5O8GPhnwO8Dn07zlMTvAD+qqh+0m26jORp4X7v86R5v+VzgecC1zW14WAHsTnIEcGRVfaXd7n/S3CDuKSX1KhV4EPgN8NEkXwAmz22cDZzS9gdw+OQ9m4DtVfXrHu8pzYthoWWtqh6juRvtDUluobmx2s0du/2qR3uA26YfPbQ3M5zLfXXuo7lf0VRraIbKHk1yGs0N8C4A3k7zgKSDaB5g86RQaMOjV53SvDkMpWUrzXOON0xpOhXYSXNn2vVJ/knb/jrgK9P3b/2C5vGy0NzAbW174pwkq5L8TjW3R38gycva7Tb1eK8dwLFJfrvd/yTgBcDNaZ4fckRVXQ28s60VmqG0t0/5mU5F6gOPLLScPR34YPs//0dp7r65uap+k+SNwGfa+/zfCFzW4z0+DlyW5Nc0j7F8NfCBduhpJc3Q1W3AG4HLk+yjeZzlU7RXZL2W5mluB9PchfjNVfVAe6L8qrY9wL9rd3sH8OEk32v7+yrwloX/SqSZeddZSVInh6EkSZ0MC0lSJ8NCktTJsJAkdTIsJEmdDAtJUifDQpLU6f8Dep+1nJpf5H0AAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "user_mean_rating = np.array(user_mean_rating).squeeze()\n",
    "user_mean_rating = np.sort(user_mean_rating[user_mean_rating!=0.0])\n",
    "\n",
    "pyplot.plot(user_mean_rating, 'ro')\n",
    "pyplot.ylabel('User Bias')\n",
    "pyplot.xlabel('Sorted User')\n",
    "pyplot.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Now we can sort the items by their itemBias and use the same recommendation principle as in TopPop"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [],
   "source": [
    "class GlobalEffectsRecommender(object):\n",
    "\n",
    "    def fit(self, URM_train):\n",
    "        \n",
    "        self.URM_train = URM_train\n",
    "\n",
    "        globalAverage = np.mean(URM_train.data)\n",
    "\n",
    "        URM_train_unbiased = URM_train.copy()\n",
    "        URM_train_unbiased.data -= globalAverage\n",
    "\n",
    "        item_mean_rating = URM_train_unbiased.mean(axis=0)\n",
    "        item_mean_rating = np.array(item_mean_rating).squeeze()\n",
    "\n",
    "        self.bestRatedItems = np.argsort(item_mean_rating)\n",
    "        self.bestRatedItems = np.flip(self.bestRatedItems, axis = 0)\n",
    "\n",
    "        \n",
    "    \n",
    "    def recommend(self, user_id, at=5, remove_seen=True):\n",
    "\n",
    "        if remove_seen:\n",
    "            seen_items = self.URM_train.indices[self.URM_train.indptr[user_id]:self.URM_train.indptr[user_id+1]]\n",
    "            \n",
    "            unseen_items_mask = np.in1d(self.bestRatedItems, seen_items,\n",
    "                                        assume_unique=True, invert = True)\n",
    "\n",
    "            unseen_items = self.bestRatedItems[unseen_items_mask]\n",
    "\n",
    "            recommended_items = unseen_items[0:at]\n",
    "\n",
    "        else:\n",
    "            recommended_items = self.bestRatedItems[0:at]\n",
    "            \n",
    "\n",
    "        return recommended_items\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Recommender performance is: Precision = 0.1691, Recall = 0.0389, MAP = 0.1226\n"
     ]
    }
   ],
   "source": [
    "globalEffectsRecommender = GlobalEffectsRecommender()\n",
    "globalEffectsRecommender.fit(URM_train)\n",
    "\n",
    "evaluate_algorithm(URM_test, globalEffectsRecommender)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Now let's try to combine User bias an item bias"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [],
   "source": [
    "class GlobalEffectsRecommender(object):\n",
    "\n",
    "    def fit(self, URM_train):\n",
    "        \n",
    "        self.URM_train = URM_train\n",
    "\n",
    "        globalAverage = np.mean(URM_train.data)\n",
    "\n",
    "        URM_train_unbiased = URM_train.copy()\n",
    "        URM_train_unbiased.data -= globalAverage\n",
    "        \n",
    "        # User Bias\n",
    "        user_mean_rating = URM_train_unbiased.mean(axis=1)\n",
    "        user_mean_rating = np.array(user_mean_rating).squeeze()\n",
    "        \n",
    "        # In order to apply the user bias we have to change the rating value \n",
    "        # in the URM_train_unbiased inner data structures\n",
    "        # If we were to write:\n",
    "        # URM_train_unbiased[user_id].data -= user_mean_rating[user_id]\n",
    "        # we would change the value of a new matrix with no effect on the original data structure\n",
    "        for user_id in range(len(user_mean_rating)):\n",
    "            start_position = URM_train_unbiased.indptr[user_id]\n",
    "            end_position = URM_train_unbiased.indptr[user_id+1]\n",
    "            \n",
    "            URM_train_unbiased.data[start_position:end_position] -= user_mean_rating[user_id]\n",
    "\n",
    "        # Item Bias\n",
    "        item_mean_rating = URM_train_unbiased.mean(axis=0)\n",
    "        item_mean_rating = np.array(item_mean_rating).squeeze()\n",
    "\n",
    "        self.bestRatedItems = np.argsort(item_mean_rating)\n",
    "        self.bestRatedItems = np.flip(self.bestRatedItems, axis = 0)\n",
    "\n",
    "        \n",
    "    \n",
    "    def recommend(self, user_id, at=5, remove_seen=True):\n",
    "\n",
    "        if remove_seen:\n",
    "\n",
    "            seen_items = self.URM_train.indices[self.URM_train.indptr[user_id]:self.URM_train.indptr[user_id+1]]\n",
    "            \n",
    "            unseen_items_mask = np.in1d(self.bestRatedItems, seen_items,\n",
    "                                        assume_unique=True, invert = True)\n",
    "\n",
    "            unseen_items = self.bestRatedItems[unseen_items_mask]\n",
    "\n",
    "            recommended_items = unseen_items[0:at]\n",
    "\n",
    "\n",
    "        else:\n",
    "            recommended_items = self.bestRatedItems[0:at]\n",
    "            \n",
    "\n",
    "        return recommended_items\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Recommender performance is: Precision = 0.1691, Recall = 0.0389, MAP = 0.1226\n"
     ]
    }
   ],
   "source": [
    "globalEffectsRecommender = GlobalEffectsRecommender()\n",
    "globalEffectsRecommender.fit(URM_train)\n",
    "\n",
    "evaluate_algorithm(URM_test, globalEffectsRecommender)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### The result is identical. User bias is essential in case of rating prediction but not relevant in case of TopK recommendations."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question:\n",
    "\n",
    "#### Why is GlobalEffect performing worse than TopPop even if we are taking into account more information about the interaction?\n",
    ".\n",
    "\n",
    ".\n",
    "\n",
    ".\n",
    "\n",
    ".\n",
    "\n",
    ".\n",
    "\n",
    ".\n",
    "\n",
    ".\n",
    "\n",
    ".\n",
    "\n",
    ".\n",
    "\n",
    "### The test data contains a lot of low rating interactions... We are testing against those as well, but GlobalEffects is penalizing interactions with low rating"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([2., 1., 2., ..., 2., 1., 1.])"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "URM_test.data[URM_test.data<=2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### In reality we want to recommend items rated in a positive way, so let's build a new Test set with positive interactions only"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<69878x10669 sparse matrix of type '<class 'numpy.float64'>'\n",
       "\twith 1723597 stored elements in Compressed Sparse Row format>"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "URM_test_positiveOnly = URM_test.copy()\n",
    "\n",
    "URM_test_positiveOnly.data[URM_test.data<=2] = 0\n",
    "URM_test_positiveOnly.eliminate_zeros()\n",
    "URM_test_positiveOnly"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Deleted 277365 negative interactions\n"
     ]
    }
   ],
   "source": [
    "print(\"Deleted {} negative interactions\".format(URM_test.nnz - URM_test_positiveOnly.nnz))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Run the evaluation again for both"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Recommender performance is: Precision = 0.1877, Recall = 0.0565, MAP = 0.1383\n"
     ]
    }
   ],
   "source": [
    "evaluate_algorithm(URM_test_positiveOnly, topPopRecommender_removeSeen)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Recommender performance is: Precision = 0.1632, Recall = 0.0421, MAP = 0.1183\n"
     ]
    }
   ],
   "source": [
    "evaluate_algorithm(URM_test_positiveOnly, globalEffectsRecommender)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### GlobalEffects performs worse again...\n",
    "\n",
    "### Ideas?\n",
    "\n",
    ".\n",
    "\n",
    ".\n",
    "\n",
    ".\n",
    "\n",
    ".\n",
    "\n",
    ".\n",
    "\n",
    ".\n",
    "\n",
    ".\n",
    "\n",
    ".\n",
    "\n",
    ".\n",
    "\n",
    "### Sometimes ratings are not really more informative than interactions, depends on their quality"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Take-home message: how you build your splitting and the task you are building the algorithm for are tightly interlinked"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
